versioning of digital objects in a fedora-based repository

24
Versioning of Digital Objects in a Fedora-based Repository Matthias Razum FIZ Karlsruhe DORSDL Workshop Alicante September 21, 2006

Upload: patsy

Post on 30-Jan-2016

75 views

Category:

Documents


0 download

DESCRIPTION

Versioning of Digital Objects in a Fedora-based Repository. Matthias Razum FIZ Karlsruhe DORSDL Workshop Alicante September 21, 2006. Outline. Motivation Versioning Concepts in eSciDoc Content Models Technical Approach Conclusion. Project Setup and Mission. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Versioning of Digital Objects in  a Fedora-based Repository

Versioning of Digital Objects in a Fedora-based Repository

Matthias Razum

FIZ Karlsruhe

DORSDL Workshop

Alicante

September 21, 2006

Page 2: Versioning of Digital Objects in  a Fedora-based Repository

2 September 21, 2006ECDL – DORSDL Workshop, Alicante

Outline

• Motivation• Versioning Concepts in eSciDoc• Content Models• Technical Approach• Conclusion

Page 3: Versioning of Digital Objects in  a Fedora-based Repository

3 September 21, 2006ECDL – DORSDL Workshop, Alicante

• eSciDoc is a joint project of the Max-Planck-Society (MPS) and FIZ Karlsruhe

• 6 million € five-year grant (2004 – 2009) from the German Federal Ministry of Education and Research

• It aims to build an integrated information, communication and publishing platform for web-based scientific work, exemplarily demonstrated for multi-disciplinary applications in the MPS

• eSciDoc is not a mere research project, but aims at establishing an innovative productive system

Project Setup and Mission

Page 4: Versioning of Digital Objects in  a Fedora-based Repository

4 September 21, 2006ECDL – DORSDL Workshop, Alicante

Repositories for eScience

• The contents of an institutional repository or a digital library form the ‘institutional memory’ of an organization

• And just like human memory, they should allow for associating information objects in novel contexts, thus creating new scholarship

• Interdisciplinary work is becoming increasingly important, so systems have to span scientific disciplines

• Repositories should be open, application-independent and flexible, thus laying the ground today for repurposing the information in future applications

Page 5: Versioning of Digital Objects in  a Fedora-based Repository

5 September 21, 2006ECDL – DORSDL Workshop, Alicante

Turning Static Objects into ‘Living’ Knowledge

• e-Scholarship allows to publish all intermediate results of knowledge generation from first ideas, theories, discussions with peers to final results

• Institutional Repositories and Digital Libraries need to support scholars already in the early steps of this process, thus enabling their users to share their work in progress with peers

• Thinking a step further leads to interactive authoring environments with support for collaboration and annotations

• As a result, objects loose their static nature and become ‘active nodes’ in a network of knowledge

Page 6: Versioning of Digital Objects in  a Fedora-based Repository

6 September 21, 2006ECDL – DORSDL Workshop, Alicante

Implications

• The concept of ‘ownership’ of an artifact is loosened and partly replaced by an ongoing authoring process which spans persons, places, and time

• Collaborative authoring raises an issue familiar to software developers: versioning of digital objects

• All intermediate or working versions of artifacts should become part of the repository, not just the final versions

• Good Scientific Practice requires provenance data for objects and versioning

Page 7: Versioning of Digital Objects in  a Fedora-based Repository

7 September 21, 2006ECDL – DORSDL Workshop, Alicante

Outline

• Motivation• Versioning Concepts in eSciDoc• Content Models• Technical Approach• Conclusion

Page 8: Versioning of Digital Objects in  a Fedora-based Repository

8 September 21, 2006ECDL – DORSDL Workshop, Alicante

Versioning on Object Level

• Fedora’s basic object model – as defined in FOXML – is composed of an identifier, some key descriptive properties and a set of datastreams

• Currently, each change to a datastream leads to a new version of the datastream, but not of the object itself.

• On the other hand, authors and editors perceive objects as one coherent entity, not as a set of datastreams.

• They request a ‘whole-object’ versioning which complies with their mental model.

Page 9: Versioning of Digital Objects in  a Fedora-based Repository

9 September 21, 2006ECDL – DORSDL Workshop, Alicante

Fixed and Floating Object References

• Scholarly work strongly relies on citations and external references to existing material (e.g. primary data and supplementary material)

• In the context of digital repositories, these associations are

expressed as object relations.

• Versioning of objects then raises the question how to handle relations pointing to a versioned object.

• eSciDoc implements two approaches: fixed relations pointing exactly to a given version of an object and floating relations which always point to the latest version of an object.

Page 10: Versioning of Digital Objects in  a Fedora-based Repository

10 September 21, 2006ECDL – DORSDL Workshop, Alicante

Internal and Public Versions

• Versions represent intermediate work statuses and are only visible to authors of digital objects

• Revisions are published versions of objects with persistent identifiers.

• Creating a revision is an intellectual step which most often includes some form of quality assurance, whereas versioning is an automated process.

Page 11: Versioning of Digital Objects in  a Fedora-based Repository

11 September 21, 2006ECDL – DORSDL Workshop, Alicante

Container Objects

• eSciDoc allows the grouping of objects by means of container objects like collections or bundles.

• A change to one of the contained objects substantially changes the container object as well. Therefore, any change to a contained object should lead to a new version of the container object.

• The same applies to revisioning: container objects are citable objects with their own persistent identifier. Revisioning of contained objects forces a new revision of the container object too.

Page 12: Versioning of Digital Objects in  a Fedora-based Repository

12 September 21, 2006ECDL – DORSDL Workshop, Alicante

Outline

• Motivation• Versioning Concepts in eSciDoc• Content Models• Technical Approach• Conclusion

Page 13: Versioning of Digital Objects in  a Fedora-based Repository

13 September 21, 2006ECDL – DORSDL Workshop, Alicante

Content Models in General

• An important part of implementing a Fedora repository is modeling different classes or “genre” of digital object that will be created, stored, and managed in the repository.

• A content model will typically describe the following: – Datastream composition

• the number and kinds of datastreams that must be present in the digital object

• the format(s) for those datastreams, either MIME or format identifiers• whether each kind of datastream is required or optional• whether each kind of datastream has cardinality contraints

– Semantic identifiers for each kind of datastream relationships• in the cases where a content model is a “graph” of related content

models

– Disseminators (optional)

Page 14: Versioning of Digital Objects in  a Fedora-based Repository

15 September 21, 2006ECDL – DORSDL Workshop, Alicante

Structural View of Content Item

Content Item

hasRevision

*

Content Component

hasComponent*

CC License

hasLicense*

License

hasLicense

*

Metadata

hasMD*

EssentialProperties

hasProperties1

eSciDoc Metadata

hasDefaultMD1

CC Metadata

1

hasMD

Page 15: Versioning of Digital Objects in  a Fedora-based Repository

16 September 21, 2006ECDL – DORSDL Workshop, Alicante

Content Item Modeled as Fedora Object

Content Component

RELS-EXT

CC MD

License1

...

Licensen

Content Stream

hasComponent *Content Item

RELS-EXT

eSciDoc MD

MD1

...

MDn

WOV MD

Page 16: Versioning of Digital Objects in  a Fedora-based Repository

17 September 21, 2006ECDL – DORSDL Workshop, Alicante

Container Modeled as Fedora Object

Content Item

RELS-EXT

eSciDoc MD

MD1

...

MDn

WOV MD

hasMember *Container

RELS-EXT

eSciDoc MD

MD1

...

MDn

Structure Map

WOV MD

Page 17: Versioning of Digital Objects in  a Fedora-based Repository

18 September 21, 2006ECDL – DORSDL Workshop, Alicante

Outline

• Motivation• Versioning Concepts in eSciDoc• Content Models• Technical Approach• Conclusion

Page 18: Versioning of Digital Objects in  a Fedora-based Repository

19 September 21, 2006ECDL – DORSDL Workshop, Alicante

Whole-Object Versioning Metadata

• Fedora versioning works automatically within objects

• The eSciDoc middleware keeps track of whole object versions via objectVersion metadata

• The eSciDoc middleware also can tag particular whole object versions as “revisions” which will be official published views of the object

Page 19: Versioning of Digital Objects in  a Fedora-based Repository

20 September 21, 2006ECDL – DORSDL Workshop, Alicante

Animated View

t0 t1 t2 t3 t4

ContentItem

CC1

PID: parent:1VersionID: 1.0DOI: --

PID: child:1Version: t0

PID: child:2Version: t0

PID: parent:1VersionID: 1.1DOI: --

PID: child:1Version: t0

PID: child:2Version: t1

PID: parent:1VersionID: 1.2DOI: --

PID: child:1Version: t0

PID: child:2Version: t1

PID: child:3Version: t2

PID: parent:1VersionID: 1.3DOI: x.y/rev:1

PID: child:1Version: t0

PID: child:2Version: t1

PID: child:3Version: t2

PID: parent:1VersionID: 1.4DOI: --

PID: child:1Version: t4

PID: child:2Version: t1

PID: child:3Version: t2

CC2

CC3

Revision

Page 20: Versioning of Digital Objects in  a Fedora-based Repository

21 September 21, 2006ECDL – DORSDL Workshop, Alicante

Object Version XML

<objectVersion versionID=”1.0”>

<comment> this is the first whole object version </comment>

<component PID=”child:5” dateTime=”2006-05-10T12:21:57Z”/>

<component PID=”child:6” dateTime=”2006-05-10T12:21:57Z”/>

</objectVersion>

<objectVersion versionID=”1.1” revisionID=”doi:10.11.1234”>

<comment>demo:5 is the same; demo:6 modified; demo:7 ingested </comment>

<component PID=”child:5” dateTime=”2006-05-10T12:21:57Z”/>

<component PID=”child:6” dateTime=”2006-08-11T09:23:09Z”/>

<component PID=”child:7” dateTime=”2006-08-11T09:23:09Z”/>

</objectVersion>

Page 21: Versioning of Digital Objects in  a Fedora-based Repository

22 September 21, 2006ECDL – DORSDL Workshop, Alicante

Outline

• Motivation• Versioning Concepts in eSciDoc• Content Models• Technical Approach• Conclusion

Page 22: Versioning of Digital Objects in  a Fedora-based Repository

23 September 21, 2006ECDL – DORSDL Workshop, Alicante

Conclusion

• Versioning is essential for repositories which cover the whole object lifecycle

• Fedora already comes with a powerful versioning mechanism, but cannot fulfill all requirements of eSciDoc

• Atomistic content models make versioning even more complex

• The proposed approach provides a solution for advanced versioning requirement and at the same time is a demonstration of Fedora’s flexibility and adaptability

Page 23: Versioning of Digital Objects in  a Fedora-based Repository

24 September 21, 2006ECDL – DORSDL Workshop, Alicante

Acknowledgements

The concepts in this presentation are based on

• eSciDoc’s Logical Data Model, created by Natasa Bulatovic (ZIM, Max Planck Society)

• a joint workshop of ZIM and FIZ with Sandy Payette and Carl Lagoze

Page 24: Versioning of Digital Objects in  a Fedora-based Repository

Questions

[email protected]/homepage.html