creating a pragmatic pan-european framework for permanent access to the records of science

26
Peter Tindemans, Geneva, 10-07-06 1 Creating a pragmatic pan-European framework for permanent access to the records of science Dr. Peter Tindemans Chairman Task Force Permanent Access

Upload: brasen

Post on 20-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Creating a pragmatic pan-European framework for permanent access to the records of science. Dr. Peter Tindemans Chairman Task Force Permanent Access. Summary. 3 ICT Infrastructures: Networks, High Performance Computing, GRIDs - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 1

Creating a pragmatic pan-European framework for permanent access to the records of science

Dr. Peter Tindemans

Chairman Task Force Permanent Access

Page 2: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 2

Summary

3 ICT Infrastructures: Networks, High Performance Computing, GRIDs

4th one will affect science as profoundly: an ‘infrastructure’ to provide long-term preservation of and access (“P&A”) to Records of Science (and digital heritage in general)

How does this look like? How should we build it” Major stakeholders from science, libraries, archives

offer their strategic commitment to national governments and the EU to create in 3-5 years sufficient momentum

Page 3: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 3

Overview

1. Background: including Records of Science in Digital Heritage: Task Force Permanent Access.

2. The problem and range of technical solutions required

3. High-level, strategic, pragmatic approach: need and essence

4. How does ‘European Digital Infrastructure for Preservation of and Access to Records of Science’ looks like?

5. Alliance in the making6. Financial challenge7. Some issues

Page 4: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 4

1. Background

Documents (+ images): libraries (+NASA, ESA) Audiovisual media, cultural heritage: broadcasting

organisations, museums, .. Scientific and operational data: labs, communities,

scientists, service providers,..

Hence ‘curation’ Digital libraries, digital archives, digital repositories Preservation or perennial access

Page 5: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 5

Background in terms of process and getting political attention

Political attention for preservation focused on cultural heritage and on libraries (legal deposit)

Recently, inclusion of records of science in digital cultural heritage evolved from ‘records of history of science’ to ‘records of science in operation’; this concerns not just scope, but also nature of records: ‘data’ next to ‘documents’ (and other cultural physical artefacts)

Particular culmination point EU Conference “Permanent Access to the Records of Science” (National Library of Netherlands KB, Netherlands EU Presidency),1st November, 2004,The Hague.

Participants agreed to need to create European infrastructure for long-term preservation to and permanent access to records of science. KB urged to create a Task Force.

Page 6: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 6

Composition Task Force Bertil Andersson, Chief Executive European Science Foundation; Lynne Brindley, Chief Executive The British Library; Wim van Drimmelen, Director General Koninklijke Bibliotheek; Norbert Kroo, Secretary-General Hungarian Academy of Sciences; Wolffried Stucky, professor Institute of Applied Informatics and Format

Description Methods, Karlsruhe University, curator Max Planck Institute of Computer Science, Germany;

Malcolm Read, Executive Secretary Joint Information Systems Committee, UK;

Vincenzo Beruti , ESA/ESRIN; John Wood, Chief Executive Council for the Central Laboratory of the

Research Councils, UK; Peter Hendriks, Board Springer Science and Business Media,

Executive Board International Association of Scientific, Technical and Medical Publishers.

Tomas Lidman, Director General The National Archives of Sweden; Peter Tindemans, chair, on behalf of the Koninklijke Bibliotheek.

(reflects ‘data’ and ‘documents’)

Page 7: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 7

2. Problem Science-angle

Individual scientist Maintaining and accessing databases built up by individuals: e.g.

Madison database on GDP Requirements of journals and funders with regard to supplementary or

original data New cultural paradigm challenges individuals, universities, funders, etc.

Large research organisations and communities (CERN, ESA,..): volume of data

European social sciences data archives

Libraries- and archives angle (“perennial storage”) acidification threatened paper; obsolescence and volume explosion

jeopardise digital heritage

Other cultural heritage organisations Similar to libraries and archives

Page 8: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 8

Dimensions of the problem Technical

Digital data unstable Perishable Migration (or other techniques to ensure permanent accessibility if software and

hardware changes) Constant care and intervention Interoperability Volume

Economic Cost estimates Business model Public good nature Relation to Open Access Build into normal R&D funding model

Digital rights/access management

Organisational, including ‘data model of world: (producing and (re-using data)

Page 9: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 9

Range of solutions required: RDD programme for curation in general, preservation in particular Storing Petabytes to 100s of Petabytes to Exabytes, surviving

changes in hardware and software technologies, retrieving information

Standardised approach to describe information (metadata) and management of information as successive ‘virtualisation’ layers (hardware, data, knowledge, workflows, trust, management) to enable fully automated, distributed solutions

Complex dynamic datasets and databases

Legal solutions for digital access and rights management

Economic business models based on value-chain analysis and public-good aspects

Technical tools, e.g. to overcome ‘museum of old ICT technologies’, and cumbersome migration

Page 10: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 10

Digital preservation methods suggested (Thibodeau, 2002)

Page 11: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 11

What has happened?

Documents (+images): research libraries (esp. US), deposit libraries (esp. Europe: BL, KB)

Data: individual labs, scientists, networks of archives Some national efforts: UK, e.g. JISC, Digital

Preservation Coalition (+ Digital Curation Centre), Germany (NESTOR) emerging, Netherlands

Some EU-funded projects: scattered, focus often on co-ordination

KB: back-up arrangements with several large scientific publishers

Some global co-operation, some efforts at standardisation

Page 12: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 12

3. High-level, strategic, pragmatic approach: need

Increasing awareness among experts, sometimes institutions about size and complexity of problem of P&A.

Many projects, standards, good practices, etc.

But: No recognition in Europe that preserving and making accessible the

digital heritage on very long time scales is strategic issue for Organisations (with few exceptions) Governments, as well as many private sector parties no financing mechanism

In USA since 2002 National Digital Information Infrastructure and Preservation Program: Library of Congress working together with NSF, research libraries, archives etc; 100 M$ to start with; recently NARA got 300 M$ (emphasis still on documents).

Page 13: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 13

High-level approach: essence

1. Make ‘digital heritage’ stakeholders understand at ‘board level’ economic and cultural importance of P&A for their strategic

development 2. Involve public and private parties: essential to find business model

based on private and user interests and cost allocations, public infrastructure: important ‘public good’ aspect.

3. Adopt non-technical ‘model of world’ as basis for the ‘infrastructure’

4. Adopt practical way aheada. Where is highest impact possible?b. Involve initially not too few, not too many stakeholders c. Connect to ongoing activities: don’t replace, but integrate

responsibilities

Page 14: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 14

a. Highest impact: focus on Records of Science, taken in broad sense

Two worlds1. Cultural heritage

Begins to include digital heritage UNESCO; ‘memory institutions’: archives, deposit libraries, museums. Science = ‘records of history of science’. Politically increasingly visible: UNESCO, EU

2. Records of science. S&T in digital age: ‘data’ next to ‘documents’; small part to spill into traditional archives. ‘Science’ = S + T; NSE+BMS+SSH; Large scale data collection for operational services

and science (meteorology, GIS, census, …); experiments + observations + simulations + surveys + census and poll + history records; data also includes ‘enriched’ and ‘curated’ data: “knowledge preservation”

Gearing up best done by focusing on ‘Records of Science’ Greatest momentum:

Inherent needs of scientific community and organisations High ‘specific mass’ (including financial mass)

Covers broad field Academic and deposit libraries, scientific publishers straddle two worlds. Archives linked to e.g. historical, social and economic sciences.

Page 15: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 15

b. Not too few, not too many

Common European approaches: ‘call for tender’ for projects, All-inclusive approach: all stakeholders from 25 member

states plus Commission, resolutions, communications, agency, ….

Instead focus on critical mass of stakeholders and focused action, i.e. Emphasis on preservation (though preservation cannot be

separated from building digital collections) Aim to create ‘infrastructure’ Aim to create growing consensus among and conditions for

‘communities’ and organisations and their particular preservation projects.

Page 16: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 16

4. Model of the world Framework of conditions and rules of conduct ‘Communities’ produce science (particle physics; social

sciences; astronomy/space science; geophysics/oceanography/earth sciences/earth observation;..), are different, but have similar structural elements to house “Record of Science” In some disciplines short-term role individual researchers ‘Laboratories’ Specialised data providers Specialised publishers or web-based archives Specialised reserch libraries

Cross-cutting horizontal structure too exists: Scientific publishers, multidisciplinary open archives Academic research libraries Deposit libraries Conventional archives

All are digital archives or repositories in digital world

Page 17: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 17

Community A Community B Community C

labs

special data providers

Special publishers

special research libraries

labs labs

special data providers

special data providers

special publishers

special research libraries

special publishers

special research libraries

general scientific publishers, general open archives, academic research libraries, deposit libraries, conventional archives

community –specificprovisions

community –specificprovisions

community –specificprovisions

Cross-disciplinary, cross- community conditions, mechanisms and provisions

Page 18: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 18

Transform into framework (‘infrastructure’) of real life organisations and operating conditions (for interoperability and collaboration)

1. Identify set of core physical digital archives in limited number of initial communities, and in horizontal layer (“critical mass” and ‘high specific mass’ are essential criteria)

2. These must OAIS-compliant to ensure proper archiving, interoperability and long-term preservation

3. Framework for metadata, Framework for persistent identifiers, and number of registries

4. Cost-effective preservation methods and services must be available

5. Common framework of principles and guidelines for management of access and rights (underlying the technical tools to implement this framework)

6. Financial mechanism for developing and testing implementation tools, techniques and services

7. a. Certification service providers, accredited according to b. Common European accreditation mechanism.

Page 19: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 19

5. An Alliance in the making

Aims Establish wide consensus on framework

(‘infrastructure’) for LTPA; initial focus on science Accelerate significantly creation of its main building

blocks Work with national governments and EU to

strengthen European strategies, policies and their implementation

Strengthen role European parties world-wide Articulate and maintain ongoing R&D&D programme

3-5 years

A “Rolling Stone”

Page 20: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 20

Tasks

Assisting communities: initial core set and others Enhancing and consolidating consensus on the

building blocks of the ‘infrastructure’ Helping establish European funding mechanism Helping establish European accreditation mechanism Liasing with national governments and EU Promoting sustainable business models Raising awareness: funding bodies, professional

societies, universities, …..

Page 21: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 21

Core Alliance Partners

European Science Foundation Some of most active libraries: British Library, KB Some major scientific organisations: ESF, ESA,

CERN, EMBL (EIROFORUM), CCLRC, Max Planck Gesellschaft, CESSDA are among those approached

Association of Scientific, Technical and Medical Publishers

Some major national archives JISC, ‘National coalitions’ for P&A, where they exist: UK,

Germany, ….

Corporate associate members (e.g. ICT industry): ‘Customer-contractor’ principle

Page 22: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 22

Strengthening emerging consensus;Building on what is being done

Conceptualisation and standardisation, e.g. OAIS Dublin Metadata Core Initiative (but still very much

library/document-oriented) Draft Audit Checklist for Certification of Trusted Digital

Repositories (RLG, NARA plus European experts) Practical development and implementation, e.g.

Several EU-funded projects (but too much focus on co-ordination); important new ones: DRIVER, CASPAR (with e.g. CCLRC, ESA-ESRIN)

Strong national projects (but in few countries only); e.g. DARE (Netherlands)

Public-private agreements (e.g. libraries and publishers) Audit and Certification of Digital Archives Project (CRL) to test

audit 3 archives

Page 23: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 23

6. Financial model

1. Need to create European, but strongly distributed infrastructure;

2. Need to make Europe visible, strong partner in global efforts

Therefore: Partners continue current efforts and investments Partners contribute to establish small European organisation

to co-ordinate Alliance efforts ‘100 M€’ for the real action for European funding mechanism

not to be disbursed by Alliance; for developments in communities the Alliance will work

with; to create the enabling conditions.

Leveraging national and further European funding (central + decentralised funding totals to build this

infrastructure much and much higher than 100 M €)

Page 24: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 24

Practicalities about the Alliance

Members: leading national or international organisations

Strategic allies: national coalitions or competence networks; commercial companies or vendors

Board; Director and some staff Office in Brussels (at ESF’s COST office?) Budget 3 years: 1.8 M€ Per partner: ~ 75 k€

Page 25: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 25

Workplan Year 1

Interface EU (FP7), ESFRI, national organisations Facilitate information sharing about preservation approaches and support infrastructure

(standards, authentication, registries, metadata capture mechanisms,..) Gathering cost information Involvement in on-going drafting archive certification standard Identifying resources for science drive interoperability as potential basis for automated

interoperability Year 2 (apart from continuing interfacing)

Shared persistent identifiers scheme Prototype interoperable search and discovery tools supported by common data models Certification standard ready for submission to ISO; preliminary work on accreditation

organisation Some alignment of operating practices and use of Digital Rights Management and

Authentication and Authorisation systems Year 3 (apart from continuing interfacing)

Prototyping and testbed activities to put some into production use (e.g. applications to find and combine data and relevant publication material, supported by shared catalogues and data models, single sign-on access to non-public data, etc)

Co-operation on large scale storage solutions (exabyte) Finalise business model for accreditation system

Year 4 + (only after evaluation) Advanced development of interoperable virtualisation layers

Page 26: Creating a pragmatic pan-European framework for permanent access to the records of science

Peter Tindemans, Geneva, 10-07-06 26

7. Some Issues

‘Raw’ data, so far much focus on documents Model of world primarily based on

international communities, or on national approaches (cf. networking with NRENs, connected via GEANT)

ALLIANCE to be set up as corporation or e.g. as consortium

How to get EU involved in a strategic (not individual project-based) and internally co-ordinated approach?