jts 2010, 3 may 2010

16
JTS 2010, 3 May 2010 Context Sensitive Archiving of Videos on the Web Paper authors: Thomas Drugeon Valentine Frey Jérôme Thièvre Matteo Treleani

Upload: wynter-mueller

Post on 31-Dec-2015

37 views

Category:

Documents


0 download

DESCRIPTION

Paper authors: Thomas Drugeon Valentine Frey Jérôme Thièvre Matteo Treleani. Context Sensitive Archiving of Videos on the Web. JTS 2010, 3 May 2010. Ina collections. Current collections 60 years of TV program and 70 years of radio program Legal deposit since 1992 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: JTS 2010, 3 May 2010

JTS 2010, 3 May 2010Context Sensitive Archiving of Videos on the Web

Paper authors:Thomas DrugeonValentine FreyJérôme ThièvreMatteo Treleani

Page 2: JTS 2010, 3 May 2010

Ina collections

2

Current collections

60 years of TV program and 70 years of radio program

Legal deposit since 1992

4,500,000 hours of TV and radio + 1,000,000 hours captured live from 102 TV and radio channels each year

Context sensitive archiving on the web| 2 mai 2010

Extension to the Web

Web legal deposit law (2006), shared between BnF and Ina, as an extension to their current collections

Ina is developing specialized tools and methods to collect, archive, preserve, and give access to this archived web collection

→ Preserve, promote, transmit

Page 3: JTS 2010, 3 May 2010

Web Legal Deposit

3

Archiving French audiovisual information on the web→ Focus on audiovisual contents

Context sensitive archiving on the web| 2 mai 2010

Why not only archive video and audio contents from the web?The web is not just a way to access contents, it is a media

→ Archiving websites related to French audiovisual media

Operational since February 2009as of april 2010: 6000 websites (3000 at start) 2,500,000,000 “objects”, 260 TB 10,000,000 video objects, 100 TB 19,000,000 autio objects, 100 TB

→ 260 TB compressed to only 21 TB of storage (DAFF)

Page 4: JTS 2010, 3 May 2010

Methods

4

The web is not a broadcast media:no stream to capture, no explicit path to follow

Context sensitive archiving on the web| 2 mai 2010

The web responds to interactionsWe have to discover and recreate these interactions to archive it

→ crawling

Websites grow and change in heterogeneous waysWe have to visit a page to know it was updated

→ sampling

Accessing the archive means browsing itWe have to recreate the interactions to make the archive browsable

→ simulating

Page 5: JTS 2010, 3 May 2010

Limits

5

Crawling

Sampling

Simulating

Context sensitive archiving on the web| 2 mai 2010

Some updates will be missingLinked pages are crawled at a different date from the original page

Some interactions cannot be crawled, and thus some contents will be missing or altered in the archive (pages or parts of pages)

Dead web (train reservation, google search, etc.)Some interactions are lost (crawling issues)Temporal inconsistencies between pages (sampling issues)

Page 6: JTS 2010, 3 May 2010

Web Archaeology

6

Authenticity: the document is what it pretends to be (Duranti, 2001)Reliability: we can trust the document and its content (Bachimont, 2009)

Non-Integrity of web documents Integrity: the document hasn’t been altered (Lynch, 1994)

The consequence of technical problems:

How to preserve authenticity and reliability without depending on material integrity?

Reconstructing the meaning of the document through traces (a sort of archaeological practice)

DlWeb archives traces

Context sensitive archiving on the web| 2 mai 2010

Page 7: JTS 2010, 3 May 2010

7

Context influences the meaning of a video posted on the webBut not all the items of the context have the same impact on interpretation.

Example

Preserving the meaning of a video posted on the web means to preserve the significant elements of the context

Meaning precedes the material form.

Web Archiving: pre-eminence of the meaning

We thus have to find the elements influencing the meaning.

Context sensitive archiving on the web| 2 mai 2010

Page 8: JTS 2010, 3 May 2010

Example: The relocation of The Eiffel Tower

8

Ina.fr posted a news programme from 1964: the Eiffel Tower was to be relocated. The video provoked a buzz on the Web.

Context sensitive archiving on the web| 2 mai 2010

Page 9: JTS 2010, 3 May 2010

9

Example: The relocation of The Eiffel Tower

A methodological approach: The commutation test (from linguistics): The substitution of an item of the expression can cause a possible modification of the meaningEx. changing a phoneme of a word (peer – beer).

How to find which elements of the context to preserve in order to safeguard the archival value of the video (its correct interpretation) ?

Context sensitive archiving on the web| 2 mai 2010

Page 10: JTS 2010, 3 May 2010

How to reconstruct the meaning in complex documents?

10

Where is the document and where the context?

Web Documents are often complex and referring to a large spectre of cultural elements.

Hypothesis

We can reconstruct the meaning through a narrativization.Narrativization can be based on the research of cluesIt’s the critical historical approach called by Ginzburg “evidential paradigm” (clues are in this case the significant elements found through the commutation test).

A Sherlock Holmes’ approach…

Context sensitive archiving on the web| 2 mai 2010

Page 11: JTS 2010, 3 May 2010

11

Example: narrativization based on clues

The Dailymotion channel of Gameblog.fr posts a news report on France 2 from the 21st of November 2004, and explains that the content was an amalgam of fake news.

It announces a collective suicide in Japan: 147 people committed suicide because of a delay in the release of a videogame (Dead or Alive).

They swallowed some sachets of silicon…

Context sensitive archiving on the web| 2 mai 2010

Page 12: JTS 2010, 3 May 2010

12

A link in a comment allows us to better understand what happened.

France 2 cited an articled which appeared in the newspaper Libération, reporting a collective suicide in Mars 2004.

The source of the article was a Blog post.

Example: narrativization based on clues

Context sensitive archiving on the web| 2 mai 2010

Page 13: JTS 2010, 3 May 2010

13

The post was satirical: it appeared on the webzine Xbox Mag to mock the excessive interest in the release of this product by videogamers.

Example: narrativization based on clues

Context sensitive archiving on the web| 2 mai 2010

Page 14: JTS 2010, 3 May 2010

14

The editors of Xbox Mag advised France 2 and Libération about the error.

The 25th of November Libération presented a rectification.

The 26th of November France 2 announces the error blaming the “Anglo-Japanese press”(their only source was Libération)

Example: narrativization based on clues

Context sensitive archiving on the web| 2 mai 2010

Page 15: JTS 2010, 3 May 2010

The complexity of a web document

The problem of the completeness of traces

To understand the facts we need no less than 3 web pages often not interrelated:

-The video posted on Dailymotion

-The original post on Xbox Mag

-The post on Xbox Mag explaining the errors

The Web always refers to (and remediates) other medias:

-The archival video of France 2 (conserved at Inathèque)

-The press: Libération

15

The Intrinsic Value of a Web Document

Web Archiving is the most complete way to reconstruct these events

(TV and press are not sufficient)

The example reveals:

Context sensitive archiving on the web| 2 mai 2010

Page 16: JTS 2010, 3 May 2010

How to help reconstructing the narration?

16

Give access to the researcher to all available technical and methodological information (ie archiving context)

→ clues

Context sensitive archiving on the web| 2 mai 2010

DlWeb archives traces

Develop tools to help the researcher to organise and exploit these clues

→ Methodological DlWeb workshops with audiovisual researchers, archivists and documentalists

Improve completeness