lds 3

42
SCAPE David Tarrant @davetaz [email protected] Open Planets Foundation / University of Southampton iPres2012 Toronto, October 2012 LDS 3 Applying Preservation Principals to Linked Data Systems This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).

Upload: abra

Post on 22-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

LDS 3. David Tarrant @ davetaz [email protected] Open Planets Foundation / University of Southampton. Applying Preservation Principals to Linked Data Systems. iPres2012 Toronto, October 2012. Present Day. Presenting the REF The Results Evaluation Framework. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: LDS 3

SCAPE

David Tarrant @davetaz [email protected]

Open Planets Foundation / University of SouthamptoniPres2012Toronto, October 2012

LDS3Applying Preservation Principals to Linked Data Systems

This work was partially supported by the SCAPE Project.The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).

Page 2: LDS 3

SCAPE

Present Day

2

Page 3: LDS 3

SCAPEPresenting the REF

The Results Evaluation Framework

• 5 Tools (Droid, Fits, file, fido, Tika)

• 65 Versions (from 2008 to now)

• 1 Govdocs Corpora

• 1 Question….

3

Page 4: LDS 3

SCAPE

How accurate are file format identification tools historically?

4

Page 5: LDS 3

SCAPE

5

PDF 1.4

Page 6: LDS 3

SCAPE

6

DOCX

Page 7: LDS 3

SCAPE

9 Months Ago

7

Page 8: LDS 3

SCAPEWhy is Data Important?

• Data and Metadata are knowledge.• Knowledge is power.• Knowledge enables decision.• Knowledge enables process.• Knowledge empowers action.• Knowledge enables us to say because…

8

Page 9: LDS 3

SCAPEProcesses

9

ProcessDecision

DATA

DATA

DATA

A Classic Flow ChartData is key to making decisions

Page 10: LDS 3

SCAPEPolicy

10

ProcessPolicy

DATA

DATA

DATA

A Preservation Flow ChartData is key to informing policy

Page 11: LDS 3

SCAPEPolicy Data - Generated

• When?• Who?• What it affects?• What action is taken?

• Why?11

Policy

Page 12: LDS 3

SCAPE

Why?• Because something said so?

12

• When?• Who?• What it affects?• What action is taken?

• Why?

DATA

DATA

DATA

Page 13: LDS 3

SCAPECase Study Example (Opinion)

• Due to format obsolescence, all flash video files are to be migrated to H264/AAC.• Input data: Study on proliferation of flash and evidence of

lacking support from the rights holder, adobe. • File B was created from File A a year ago as it was

identified as being a flash video file.• Today, File A is identified as being an ogg video file.

• What has changed? Why? Does it affect me? Who generated the wrong information? Did they generate any other wrong information? 13

Page 14: LDS 3

SCAPE

I Don’t Know!

14

Page 15: LDS 3

SCAPE

6 Months Ago

15

Page 16: LDS 3

SCAPEA Fact?

16

File#1

application/zip

hasIdentification

Page 17: LDS 3

SCAPEProvenance

• Tarrant, David and Carr, Leslie (2012) LDS3: Applying Digital Preservation Principals to Linked Data Systems. In, Ninth International Conference on Digital Preservation (iPres2012), Toronto, Canada

17

Tim Berners-Lee

5-Star Linked Data Guide

Provides

Page 18: LDS 3

SCAPEData!!!

• One fact.• One document the fact comes from• One citation about the documents place of publication.

• Who, What, When and Where• Who they worked for and with.

18

Page 19: LDS 3

SCAPE

Named-Graph • In Linked Data a document is called a named-graph.

• But these also get used for two purposes!!

19

File#1

Application/zip

hasIdentification

Page 20: LDS 3

SCAPEThe two uses of the named-graph

No. 1 – Data Publication

20

DATA

DATA

DATA

Named-GraphFile#1

Application/zip

hasIdentification

Page 21: LDS 3

SCAPEThe two uses of the named-graph

No. 2 – Data Discovery/Query

21

Named-GraphFile#1

application/zip

hasIdentification

DATA

DATA

DATA

File#1

application/msword

hasIdentification

Page 22: LDS 3

SCAPEThe two uses of the named-graph

No. 2 – Data Discovery/Query

22

Works For

Works For

Named-GraphFile#1

Application/zip

hasIdentification

Named-GraphFile#1

application/zip

hasIdentification

File#1

application/msword

hasIdentification

Page 23: LDS 3

SCAPE

Query Graph

Source Graph 2

Source Graph 1

Quads

23

File#1

application/zip

hasIdentification

File#1

application/msword

hasIdentification

After all, RDF is a graph model

RDF the spec, not the RDF/XML serialization

Page 24: LDS 3

SCAPE

Query Graph

Source Graph 2

Source Graph 1

Quads

24

File#1

application/zip

hasIdentification

File#1

application/msword

hasIdentification

usesTool

File 5.04

usesTool

File 5.07

Page 25: LDS 3

SCAPE

File1/Identification/tool/file/version/5.03

File#1

University of Southampton

hasIdentification

Still with me…

• Ok so what about versioning?

25

File1/Identification/tool/file/version/5.07

File#1

application/msword

hasIdentification

Page 26: LDS 3

SCAPE

Latest

26

/File1/Identification/tool/file/

File1/Identification/tool/file/version/5.03

File#1

University of Southampton

hasIdentification

File1/Identification/tool/file/version/5.07

File#1

application/msword

hasIdentification

prev

ious

ver

sion

Page 27: LDS 3

SCAPE

3 Months Ago

27

Page 28: LDS 3

SCAPEwww.LDS3.org

• A technical solution to all the complexity, automatic:

• Versioning• Linking• Annotation• Named-Graph Management• Query Management

28

Page 29: LDS 3

SCAPE

Demo

29

Page 30: LDS 3

SCAPEwww.LDS3.org

• CRUD

• SWORDv2 (Based Upon)

• Oauth Authentication

30

Page 31: LDS 3

SCAPEIn the paper

• Links between P2-Registry, Pronom and LDS3

• Description of the LDS3 specification• Overview of software in the LDS3 stack (hardly any of

it is new)• How LDS3 relates to Amazon S3• More on named-graphs versioning• More on information and non-information resources.

31

Page 32: LDS 3

SCAPE

2 Months Ago

32

Page 34: LDS 3

SCAPE

34

Page 35: LDS 3

SCAPE

35

Present Day

Page 36: LDS 3

SCAPEPresenting the REF

The Results Evaluation Framework

• 5 Tools (Droid, Fits, file, fido, Tika)

• 65 Versions (from 2008 to now)

• 1 Govdocs Corpora

• 1 Question….

36

Page 37: LDS 3

SCAPE

How accurate are file format identification tools historically?

37

Page 39: LDS 3

SCAPE

39

DOCX

http://data.openplanetsfoundation.org/ref/docx/

Page 40: LDS 3

SCAPE

40

Back To The Future

Page 41: LDS 3

SCAPEThe Future

• Get me the identification for a file as it would have been on 3rd October 2010.

GET /ref/?query=“SELECT ?identificaiton where file = X” HTTP/1.1

Accept-Datetime: Sun, 3 Oct 2010 12:00:00 GMT Accept: text/plain

application/zip

41

Page 42: LDS 3

SCAPE

David Tarrant @davetaz [email protected]

Open Planets Foundation / University of SouthamptoniPres2012Toronto, October 2012

LDS3Applying Preservation Principals to Linked Data Systems

This work was partially supported by the SCAPE Project.The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137).