1 web archiving in the british library john tuck head of british collections february 2004

21
1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

Upload: muriel-page

Post on 23-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

1

WEB ARCHIVING IN THE BRITISH LIBRARY

John TuckHead of British CollectionsFebruary 2004

Page 2: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

2

BRITISH LIBRARY: CONTEXT

Created by British Library Act 1972.

National Library of the United Kingdom.

Origins from 1753.

One of world’s greatest research libraries.

160 million collection items.

Page 3: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

3

BRITISH LIBRARY: COLLECTION DEVELOPMENT

Building as completely as possible the UK national published archive - current and retrospective gap filling; print and electronic.

Collecting research-level English-language material published world-wide in the humanities, social sciences, STM.

Buying foreign-language material selectively

Material acquired through: legal deposit, voluntary deposit from publishers, purchase, donation, exchange.

Page 4: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

4

LEGISLATION

Legal Deposit Libraries Act 2003: enabling legislation.

VDEP: Voluntary Deposit of Electronic Publications.

Page 5: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

5

DOMAIN.UK

Six-month experiment to select and capture 100 UK web-sites, 2001.

audit change, loss, links, etc.

determine next steps.

Page 6: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

6

DOMAIN.UK: Why?

Short-lived nature/changing content of many web-sites.

loss of information.

increasing reference to web-sites in research/scholarship.

Page 7: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

7

DOMAIN.UK: Voluntary/Rights Cleared Approach

Voluntary.

Requiring explicit agreement of website publishers to take part in pilot.

No public access.

Page 8: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

8

DOMAIN.UK: Selection

Websites of historical or cultural significance.

Cross-section of Dewey Decimal Classification.

Page 9: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

9

DOMAIN.UK: Process

E-mail selected sites for approval and to check whether already archived.

Measure sites for links, size, change, etc.

Frequency of visits: every three weeks or more in some cases.

Supported by those sites approached.

Report recommended scaling up.

Page 10: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

10

BRITISH LIBRARY WEB ARCHIVING PROGRAMME

Building on Domain.uk.

BL to play leading role in collecting UK web presence in partnership with other institutions nationally and internationally.

Selective approach.

Page 11: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

11

BRITISH LIBRARY WEB ARCHIVING PROGRAMME contd.

Co-ordinate a snapshot of entire UK web presence at occasional intervals.

Achieve more regular capture of limited and well-defined range of sites.

Sites judged to be research-level, whether in terms of stated intentions of sites themselves or of potential to be primary resources for research.

Page 12: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

12

WEB ARCHIVING PROGRAMME

Comprises a series of complementary projects and activities.

Based entirely on voluntary, rights-cleared basis pending secondary legal deposit legislation.

Aims to embed web archiving within the BL's overall collection development policy.

Aims to provide the infrastructure to collect, preserve and make accessible web-site material alongside material in other formats.

Page 13: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

13

WEB ARCHIVING PROGRAMME STRANDS

Four main strands:

Definition of collection development policy.

UK Web Archiving Consortium.

International Internet Preservation Consortium.

Internet Archive: incunabula of the internet.

Page 14: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

14

COLLECTION DEVELOPMENT

Appointment of Curator, Web Archiving.

Extension of policy defined for Domain.uk.

Sites of national, historical and cultural significance.

Research level now/in the future.

Page 15: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

15

UK WEB ARCHIVING CONSORTIUM

Two-year project.

Six partners: BL (lead); National Library of Scotland, National Library of Wales, National Archives, Joint Information Systems Committee, Wellcome Library.

Plan to use PANDAS software developed by National Library of Australia.

Rights to use individual sites to be cleared with rights-holders.

Page 16: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

16

UK WEB ARCHIVING CONSORTIUM contd.

Procurement exercise in process to recruit supplier to host service.

Intention to let contract in April 2004 and to be operational in summer 2004.

Sites to be made accessible to users.

Each partner to collect up to 500 sites per year, i.e. 6,000 during project.

Page 17: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

17

INTERNATIONAL INTERNET PRESERVATION CONSORTIUM

Project involving national libraries.

Led by Bibliotheque Nationale de France.

Also includes BL, Library of Congress, Library and Archives of Canada, Nordic countries, Italy, Australia, Internet Archive.

Page 18: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

18

INTERNATIONAL INTERNET PRESERVATION CONSORTIUM contd.

Aims to develop automated web-crawler mechanism.

Open-source tools to search web at regular intervals matching agreed collection development policies.

Working groups in: access tools; content management, deep web, framework, metrics and test-beds, researcher requirements.

Developmental at this stage.

Page 19: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

19

INTERNET ARCHIVE

Collecting and saving sites since 1997.

Wayback machine.

Legal, technical and procurement issues.

Page 20: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

20

SOME CHALLENGES

Defining UK.

Rapid technology change.

Third party rights (not always subject to UK law).

Libel/defamation issues.

Software issues / which platform?

Validity of a snapshot.

Page 21: 1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004

21

SOME CHALLENGES contd.

Formats for archiving.

Metadata standards.

Archiving ‘look and feel’.

Authenticity.