can you save the web? web archiving!

39
Can we save the web? Vangelis Banos http://vbanos.gr/ WEB ARCHIVING Unconference, 9-10 Δεκεμβρίου 201

Upload: vangelis-banos

Post on 22-May-2015

812 views

Category:

Technology


3 download

DESCRIPTION

1. What do you mean? 2. What is web archiving? 3. The practical use of web archives. 4. Making your own web archive.

TRANSCRIPT

Page 1: Can you save the web? Web Archiving!

Can we save the web?

Vangelis Banoshttp://vbanos.gr/

WEB ARCHIVING

Unconference, 9-10 Δεκεμβρίου 2013

Page 2: Can you save the web? Web Archiving!

Can we save the web?

• What do you mean?

• What is web archiving;

• The practical use of web archives.

• Making your own web archive.

Page 3: Can you save the web? Web Archiving!

What is the World Wide Web?

A huge collection of digital documents (websites) which are stored on special computers (web servers),

interconnected with each other.

Page 4: Can you save the web? Web Archiving!

What is the World Wide Web?

Page 5: Can you save the web? Web Archiving!

What is the World Wide Web?

Page 6: Can you save the web? Web Archiving!

What is the World Wide Web?

Page 7: Can you save the web? Web Archiving!

What is on the web?

What isn’t on the web?

Page 8: Can you save the web? Web Archiving!

Why save the web?

1. More and more items are born digital only material!2. Some websites contain unique data and valuable

information.– Users take action and make important decisions based on this

information.

3. The web is a live record of contemporary:1. Society,2. Culture,3. Science,4. Economy.

4. Responsibility to preserve the web.5. Transparency is promoted by saving the web.

Page 9: Can you save the web? Web Archiving!

Isn’t the web already safe?

• The answer is: NOT really!• Websites are in danger:– Organisations that maintain them stop caring about them,– Organisations than maintain them cease to exist,– Natural disasters destroy computer facilities (fires, floods,

storms, etc)– Technical problems damage websites (bugs, computer

viruses, backup failures, hardware failures)– Their data are tampered on purpose!!! for many reasons

(political, financial, crime, etc)

Page 10: Can you save the web? Web Archiving!

A major blog hosting company was shut down by the U.S. Authorities

Page 11: Can you save the web? Web Archiving!

Yahoo GEOCITIES has closed.

Page 12: Can you save the web? Web Archiving!

Natural disasters cause data center problems

Page 13: Can you save the web? Web Archiving!

Websites are tampered all the time

Page 14: Can you save the web? Web Archiving!

Websites are tampered all the time

Page 15: Can you save the web? Web Archiving!

Does this sound familiar?

Page 16: Can you save the web? Web Archiving!

Can we save the web?

• What do you mean?

• What is web archiving;

• The practical use of web archives.

• Making your own web archive.

Page 17: Can you save the web? Web Archiving!

Websites are tampered all the time

Page 18: Can you save the web? Web Archiving!

Web Archiving

MTSR 2013, 22 Nov 2013, Thessaloniki 18

The Internet Archive has backups

Page 19: Can you save the web? Web Archiving!

WEB ARCHIVING

The process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public.

Page 20: Can you save the web? Web Archiving!

Challenges

• How it is done technically?• What should I choose to archive?– The whole website? some pages? Some files only?

• What do I want to do with the web archive I’m creating?

• Who will have access?• Who is the owner of the web archive content?

Page 21: Can you save the web? Web Archiving!

Archiving web pages is a technical challenge

File(s) Software Hardware RECORD

Generic file archiving operation

Page 22: Can you save the web? Web Archiving!

Archiving web pages is a technical challenge

File(s)

Software

Hardware Website

File(s)

File(s)

File(s)

File(s)

File(s)

File(s)

Software

Software

Web archiving operation

Page 23: Can you save the web? Web Archiving!

How it is done?

• Possible web archiving targets:– Government websites, Educational institutions,– People’s suggestions, Currently popular websites,– Popular media, Big companies,– Special events

Page 24: Can you save the web? Web Archiving!

Web archiving strategies

Page 25: Can you save the web? Web Archiving!

Who is working on web archiving?

Many important organisations work on web archiving since 1996.

Page 26: Can you save the web? Web Archiving!

International Internet Preservation Consortium

• IIPC Members– National Libraries,– Academic Libraries,– Cultural Organisations,– Universities,– Software development companies

• Web Archiving Timeline– http://timeline.webarchivists.org/

Page 27: Can you save the web? Web Archiving!

Obligation of the National Library

• According to UNESCO:– «a national library is responsible for the

collection and storage of the national cultural heritage».

• In Greece, accoding to law No.3149/03:– «publishers or authors (when there is no

publisher) of any printed material, are obliged to submit three copies of their work to the National Library of Greece. This obligation also includes audiovisual and e-publishing material».

• What about the Greek web?

Page 28: Can you save the web? Web Archiving!

Bibliothèque nationale de France

2006: legal deposit extended to “signs, signals, writings, images, sounds or messages of any kind communicated to the public by electronic means”.

The goal is not to gather the «best of the web», but to preserve a collection representative of the webat a certain date.

Page 29: Can you save the web? Web Archiving!

Can we save the web?

• What do you mean?

• What is web archiving?

• The practical use of web archives.

• Making your own web archive.

Page 30: Can you save the web? Web Archiving!

Visiting the Internet Archive

• http://archive.org/

Page 31: Can you save the web? Web Archiving!

Internet Archive activities

• Key features, browsing, searching.• Indicative web sites:– Υπουργείο Παιδείας, 3 Jul 2010,

www.minedu.gov.gr– Υπουργείο Ανάπτυξης, 21 Dec 2009

http://www.ypoian.gr/ – The White House, 7 Apr 2000,

http://www.whitehouse.gov– BBC, 11 Sept 2001, http://www.bbc.co.uk/

Page 32: Can you save the web? Web Archiving!

Visiting Archive-It

• http://archive-it.org/

Page 33: Can you save the web? Web Archiving!

Archive-It activities

• Key features, browsing, searching, collections.• Examples:– Egypt Revolution and politics, American University

in Cairo,

– 2008 Beijing Olympic games,– Lybian Uprisings, University of Michigan,– Venice Biennale 2013

Page 34: Can you save the web? Web Archiving!

Can we save the web?

• What do you mean?

• What is web archiving;

• The practical use of web archives.

• Making your own web archive.

Page 35: Can you save the web? Web Archiving!

HTTrack website copier

http://www.httrack.com

Page 36: Can you save the web? Web Archiving!

Making your own web archive

• Using HTTrack software (Open Source)– Installation– Practical advice– Features– Usage scenarios• Archive http://2013.futurelibrary.gr/ • Archive http://www.auth.gr/

Page 37: Can you save the web? Web Archiving!

Things worth considering• Set Limits

– Filters to define the file types you want to copy.– Bandwidth limits & Connection limits to avoid overloading the site you are

archiving AND avoid saturating your library network.– Time limits

• Check the size of the files you have downloaded.• Plan for disk space according to your needs.• Check target website copyrights. Are you allowed to:

– Archive for personal use?– Archive for public use in library computers?– Archive to publish on the web?

• If you are not sure, please ask the website owner before beginning web archiving.

Page 38: Can you save the web? Web Archiving!

Scenario: create your own mini web archive in your library on a shoestring.

• Equipment:– Typical Windows computer with the biggest possible hard disk. (The

more ΤΒ, the better).– Equal backup disk (e.g. External USB hard disk).– DSL Internet connection.– HTTRACK open source software

• Select important local websites.• Get permissions from website owners if necessary.• Setup a regular web archiving schedule (e.g. Once per month).• Provide information and access to the web archive in your

library’s local computers for the public.

Page 39: Can you save the web? Web Archiving!

Can we save the web?

YES WE CAN!

• Questions?• Thank you for your attention • Contact:– Web: http://vbanos.gr– Email: [email protected]– Twitter: @vbanos