scape david open planets foundation / university of southampton ipres2012 toronto, october 2012 lds...

Post on 18-Jan-2018

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

SCAPE Presenting the REF The Results Evaluation Framework 5 Tools (Droid, Fits, file, fido, Tika) 65 Versions (from 2008 to now) 1 Govdocs Corpora 1 Question…. 3

TRANSCRIPT

SCAPE

David Tarrant @davetaz davetaz@ecs.soton.ac.uk

Open Planets Foundation / University of SouthamptoniPres2012Toronto, October 2012

LDS3Applying Preservation Principals to Linked Data Systems

This work was partially supported by the SCAPE Project.The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).

SCAPE

Present Day

2

SCAPEPresenting the REF

The Results Evaluation Framework

• 5 Tools (Droid, Fits, file, fido, Tika)

• 65 Versions (from 2008 to now)

• 1 Govdocs Corpora

• 1 Question….

3

SCAPE

How accurate are file format identification tools historically?

4

SCAPE

5

PDF 1.4

SCAPE

6

DOCX

SCAPE

9 Months Ago

7

SCAPEWhy is Data Important?

• Data and Metadata are knowledge.• Knowledge is power.• Knowledge enables decision.• Knowledge enables process.• Knowledge empowers action.• Knowledge enables us to say because…

8

SCAPEProcesses

9

ProcessDecision

DATA

DATA

DATA

A Classic Flow ChartData is key to making decisions

SCAPEPolicy

10

ProcessPolicy

DATA

DATA

DATA

A Preservation Flow ChartData is key to informing policy

SCAPEPolicy Data - Generated

• When?• Who?• What it affects?• What action is taken?

• Why?11

Policy

SCAPE

Why?• Because something said so?

12

• When?• Who?• What it affects?• What action is taken?

• Why?

DATA

DATA

DATA

SCAPECase Study Example (Opinion)

• Due to format obsolescence, all flash video files are to be migrated to H264/AAC.• Input data: Study on proliferation of flash and evidence of

lacking support from the rights holder, adobe. • File B was created from File A a year ago as it was

identified as being a flash video file.• Today, File A is identified as being an ogg video file.

• What has changed? Why? Does it affect me? Who generated the wrong information? Did they generate any other wrong information? 13

SCAPE

I Don’t Know!

14

SCAPE

6 Months Ago

15

SCAPEA Fact?

16

File#1

application/zip

hasIdentification

SCAPEProvenance

• Tarrant, David and Carr, Leslie (2012) LDS3: Applying Digital Preservation Principals to Linked Data Systems. In, Ninth International Conference on Digital Preservation (iPres2012), Toronto, Canada

17

Tim Berners-Lee

5-Star Linked Data Guide

Provides

SCAPEData!!!

• One fact.• One document the fact comes from• One citation about the documents place of publication.

• Who, What, When and Where• Who they worked for and with.

18

SCAPE

Named-Graph • In Linked Data a document is called a named-graph.

• But these also get used for two purposes!!

19

File#1

Application/zip

hasIdentification

SCAPEThe two uses of the named-graph

No. 1 – Data Publication

20

DATA

DATA

DATA

Named-GraphFile#1

Application/zip

hasIdentification

SCAPEThe two uses of the named-graph

No. 2 – Data Discovery/Query

21

Named-GraphFile#1

application/zip

hasIdentification

DATA

DATA

DATA

File#1

application/msword

hasIdentification

SCAPEThe two uses of the named-graph

No. 2 – Data Discovery/Query

22

Works For

Works For

Named-GraphFile#1

Application/zip

hasIdentification

Named-GraphFile#1

application/zip

hasIdentification

File#1

application/msword

hasIdentification

SCAPE

Query Graph

Source Graph 2

Source Graph 1

Quads

23

File#1

application/zip

hasIdentification

File#1

application/msword

hasIdentification

After all, RDF is a graph model

RDF the spec, not the RDF/XML serialization

SCAPE

Query Graph

Source Graph 2

Source Graph 1

Quads

24

File#1

application/zip

hasIdentification

File#1

application/msword

hasIdentification

usesTool

File 5.04

usesTool

File 5.07

SCAPE

File1/Identification/tool/file/version/5.03

File#1

University of Southampton

hasIdentification

Still with me…

• Ok so what about versioning?

25

File1/Identification/tool/file/version/5.07

File#1

application/msword

hasIdentification

SCAPE

Latest

26

/File1/Identification/tool/file/

File1/Identification/tool/file/version/5.03

File#1

University of Southampton

hasIdentification

File1/Identification/tool/file/version/5.07

File#1

application/msword

hasIdentification

prev

ious

ver

sion

SCAPE

3 Months Ago

27

SCAPEwww.LDS3.org

• A technical solution to all the complexity, automatic:

• Versioning• Linking• Annotation• Named-Graph Management• Query Management

28

SCAPE

Demo

29

SCAPEwww.LDS3.org

• CRUD

• SWORDv2 (Based Upon)

• Oauth Authentication

30

SCAPEIn the paper

• Links between P2-Registry, Pronom and LDS3

• Description of the LDS3 specification• Overview of software in the LDS3 stack (hardly any of

it is new)• How LDS3 relates to Amazon S3• More on named-graphs versioning• More on information and non-information resources.

31

SCAPE

2 Months Ago

32

SCAPE

34

SCAPE

35

Present Day

SCAPEPresenting the REF

The Results Evaluation Framework

• 5 Tools (Droid, Fits, file, fido, Tika)

• 65 Versions (from 2008 to now)

• 1 Govdocs Corpora

• 1 Question….

36

SCAPE

How accurate are file format identification tools historically?

37

SCAPE

39

DOCX

http://data.openplanetsfoundation.org/ref/docx/

SCAPE

40

Back To The Future

SCAPEThe Future

• Get me the identification for a file as it would have been on 3rd October 2010.

GET /ref/?query=“SELECT ?identificaiton where file = X” HTTP/1.1

Accept-Datetime: Sun, 3 Oct 2010 12:00:00 GMT Accept: text/plain

application/zip

41

SCAPE

David Tarrant @davetaz davetaz@ecs.soton.ac.uk

Open Planets Foundation / University of SouthamptoniPres2012Toronto, October 2012

LDS3Applying Preservation Principals to Linked Data Systems

This work was partially supported by the SCAPE Project.The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).

top related