web archiving in the uk: why, by whom, for whom?...what is web archiving? “deliberate and...

30
Web archiving in the UK: why, by whom, for whom? Dr Peter Webster Webster Research and Consulting @pj_webster / @WebsterRandC

Upload: others

Post on 05-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Web archiving in the UK: why, by whom, for whom?

Dr Peter WebsterWebster Research and Consulting@pj_webster / @WebsterRandC

Page 2: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

The web its own archive?

Open UK Web Archive 2004-13 comparison.@anjacks0n http://britishlibrary.typepad.co.uk/webarchive/2014/10/what-is-still-on-the-web-after-10-years-of-archiving-.html

Page 3: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Disappearing predictably

Page 4: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Disappearing unpredictably

Page 5: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

What is web archiving?

“deliberate and purposive preservation of

web material” (Brügger, 2011)

• micro or macro

• element, page, website, web sphere, whole

• harvesting, screen capture, file delivery

• public, restricted, or no access

Page 6: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

[ archive.org ]

Page 7: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

National libraries

• 16 of 28 member states within EU

• Sweden the first (1996)

• US (Library of Congress), Canada, Australia,

New Zealand, Singapore, Japan, Chile

• some with legal deposit provision: Denmark

(2005); France (2006), UK (2013)

Page 8: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Legal deposit web archiving: characteristics

• broad domain crawl, plus selective

• definition of the nation varies

• types of content included varies

• access restrictions

• indemnity against legal risks

Page 9: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Selective harvesting

• in absence of NPLD, based on permissions

• part of the case for obtaining NPLD law

• key resources, eg. government, media

• events: elections, Olympics, Eurovision

• themes: political extremism, climate change

Page 10: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Why archive your own web?

• part of orderly management of closure

• fulfilment of legal obligation

• management of risk

• part of the corporate record

• as a service for future scholars

Page 11: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Government records

Page 12: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

A lost archive?

Page 13: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

A lost archive?

Page 14: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

A lost archive?

Page 15: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Web archives in the UK

Temporal scope Content scope Access

Open UKWA 2004-present Selective Online

Legal Deposit UKWA

2013-present Comprehensive (for UK)

Onsite

JISC UK Domain Dataset

1996-2013 Comprehensive (for .uk)

Index only

UK Government Web Archive

1996-present UK government Online

Parliamentary Web Archive

2009-present UK parliament Online

Univ. of Oxford 2011-present University sites Online

Page 16: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Tricky areas

• IPR (including third parties)

• personal data

• the right to be forgotten

• database-driven content

• embedded streaming media

Page 17: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Outsourcing providers

Not-for-profit

• Archive-IT (part of Internet Archive)

• Internet Memory Research

Commercial

• Hanzo Archives [UK]

• OIA (Offline Web Archive) [Germany]

• Pagefreezer [Canada/Netherlands]

Page 18: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Ways to use the archived web

• URL search -> single page• Full-text search -> single page• Visualisation -> trend -> page

Page 19: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Changing aesthetic

gov.ie, captured by archive.org, 15 August 2000

Page 20: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Full-text search

webarchive.org.uk/shine - https://github.com/ukwa/shine/

Page 21: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Visualising trends: ngram

Page 22: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Ways to use the archived web

• URL search -> single page• Full-text search -> single page• Visualisation -> trend -> page

• Direct access to WARC• Derived datasets• API access

Page 23: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Derived datasets from the BL

From JISC UK Web Domain Dataset (1996-2010)

• File format profile• Geo-index• Crawled URL Index (CDX)• Host Link Graph

Public domain at data.webarchive.org.uk

Page 24: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

[ http://www.webarchive.org.uk/ukwa/visualisation/ukwa.ds.2/fmt ]

Page 25: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

[Wikimedia Commons, CC BY SA 2.0, by Brian (of Toronto)]

Page 26: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

A media firestorm

Page 27: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

[https://web.archive.org/web/20080211003812/http://www.newsoftheworld.co.uk/1002_sharia.shtml]

Page 28: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

UK Host Link Graph (1996-2010)

2008 | newsimg.bbc.co.uk | youtube.com | 45

2008 | archbishopofyork.org.uk | flickr.com | 1

2002 | secularism.org.uk | geocities.com | 1

Public domain at: data.webarchive.org.uk

Page 29: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

[https://web.archive.org/web/20080211003812/http://www.newsoftheworld.co.uk/1002_sharia.shtml]

Page 30: Web archiving in the UK: why, by whom, for whom?...What is web archiving? “deliberate and purposive preservation of web material” (Brügger, 2011) • micro or macro • element,

Questions ? Peter Webster

[email protected]

@pj_webster / @WebsterRandC

websterresearchconsulting.com