pericles - promoting and enhancing reuse of … · initial report on preservation ecosystem...

93
DELIVERABLE 5.1.1 Initial report on preservation ecosystem management © PERICLES Consortium Page 1 / 93 DELIVERABLE 5.1.1 Initial report on preservation ecosystem management PERICLES - Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics [Digital Preservation]

Upload: trinhkien

Post on 31-Aug-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 1 / 93

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

PERICLES - Promoting and Enhancing Reuse of Information

throughout the Content Lifecycle taking account of Evolving

Semantics

[Digital Preservation]

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 2 / 93

Project co-funded by the European Commission within the Seventh Framework Programme

(2007-2013)

Dissemination level

PU PUBLIC X

PP Restricted to other PROGRAMME PARTICIPANTS

(including the Commission Services)

RE RESTRICTED

to a group specified by the consortium (including the Commission Services)

CO CONFIDENTIAL

only for members of the consortium (including the Commission Services)

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 3 / 93

Revision History

V # Date Description / Reason of change Author

V0.9 30/06/14 Editing for internal review JL

V0.9.1 30/06/14 Editing of example and steps JB

V0.9.2 30/06/14 Version for internal review JL

V0.9.3 02/07/14 Correction of example, small changes JB

V0.9.4 08/07/14 Added Bibliography, updated glossary, formatting, captions

JB

V0.9.5 11/07/14 Integration of DDWs comments JB

V0.9.6 16/07/14 Integration AH comments and changes JB

V0.9.7 18/07/14 Restructure State of the Art chapters JB

V0.9.9 30/07/14 Final version JL/JB

Authors and Contributors Contributors at this WP are the end user partners. WP1 provides feedback to technical partners.

Authors

Partner Name

UGOE Johannes Biermann (JB)

UGOE Jens Ludwig (JL)

UEDIN Adam Carter (AC)

KCL Simon Waddington (SW)

KCL Alastair Gill (AG)

Contributors

Partner Name

ULIV Fabio Corubolo (FC)

ULIV Adil Hasan (AH)

XEROX Jean-Yves Vion-Dury (JYVD)

XEROX Jean-Pierre Chanod (JPC)

KCL Mark Hedges (MH)

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 4 / 93

Contents

1. Executive Summary ......................................................................................................................... 9

2. Introduction & Rationale ................................................................................................................. 9

2.1. Context of this Deliverable Production ................................................................................... 9

Relation to other Work Package 5 tasks ......................................................................... 9 2.1.1.

Relation to other Work Packages from WP5 view ........................................................ 10 2.1.2.

2.2. What to expect from this Document..................................................................................... 11

2.3. Document Structure .............................................................................................................. 11

3. Scenarios for a digital ecosystem model ....................................................................................... 12

4. Detailed description of the model................................................................................................. 19

4.1. Digital ecosystem .................................................................................................................. 19

State of the art .............................................................................................................. 22 4.1.1.

4.2. Dependencies reference model ............................................................................................ 25

4.3. Entities of the digital ecosystem ........................................................................................... 25

Digital Object ................................................................................................................. 25 4.3.1.

Policy ............................................................................................................................. 32 4.3.2.

Process ........................................................................................................................... 39 4.3.3.

Technical Service ........................................................................................................... 49 4.3.4.

User community ............................................................................................................ 53 4.3.5.

4.4. Change ................................................................................................................................... 55

Definition ....................................................................................................................... 55 4.4.1.

Overview of types of change ......................................................................................... 55 4.4.2.

State of the art .............................................................................................................. 58 4.4.3.

4.5. Dependencies ........................................................................................................................ 61

Definition ....................................................................................................................... 61 4.5.1.

Types of dependencies .................................................................................................. 61 4.5.2.

State of the art .............................................................................................................. 65 4.5.3.

5. Modelling of the digital ecosystem ............................................................................................... 67

5.1. Preparatory tasks .................................................................................................................. 67

5.2. Identify user activities ........................................................................................................... 68

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 5 / 93

Restructure activity to the user perspective if necessary ............................................. 68 5.2.1.

Refine activity description ............................................................................................. 68 5.2.2.

5.3. Identify entity types .............................................................................................................. 68

5.4. Model the dependencies ....................................................................................................... 68

Graphical notation ......................................................................................................... 69 5.4.1.

Vertex properties .......................................................................................................... 69 5.4.2.

Edge properties ............................................................................................................. 69 5.4.3.

5.5. Possible reasoning from the graph ....................................................................................... 70

6. Exemplary application ................................................................................................................... 71

6.1. First reasoning on the graph ................................................................................................. 75

7. Conclusion, recommendations and outlook for digital ecosystem management ........................ 76

8. Bibliography ................................................................................................................................... 78

9. Annex A: Entity descriptions ......................................................................................................... 83

10. Annex B: Stakeholder roles ....................................................................................................... 89

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 6 / 93

Figures

Figure 1: Relation between WP5 tasks .................................................................................................... 9

Figure 2: Preliminary visualisation of digital ecosystem entities with exemplary interactions (grey

arrows). Red arrows symbolize examples of change of an entity which can affect dependencies and

interactions with other entities. ............................................................................................................ 21

Figure 3: Dependencies reference model ............................................................................................. 25

Figure 4: Policy layers and possible notations for each layer ............................................................... 34

Figure 5: Overview types of change ...................................................................................................... 56

Figure 6: Types of dependencies ........................................................................................................... 62

Figure 7: example of TIMBUS vocabulary to describe the surrounding of a digital image ................... 65

Figure 8: Science example landscape expressed with ArchiMate......................................................... 72

Figure 9: Science example landscape expressed with a SysML block diagram ..................................... 72

Figure 10: Scientist user activity expressed as BPMN ........................................................................... 73

Figure 11: science dependencies expressed as property graph ........................................................... 75

Tables

Table 1: usage cases based on graph analysis ....................................................................................... 14

Table 2: List of basic graph manipulations that do not have a significant impact on the digital

ecosystem .............................................................................................................................................. 15

Table 3: Possible manipulations with tools working on the digital ecosystem model ......................... 16

Table 4: Abstract usage cases as summary from the tables before ...................................................... 18

Table 5: Preliminary properties of the entity Digital Object ................................................................. 27

Table 6: Preliminary properties of the entity policy ............................................................................. 34

Table 7: Preliminary properties of the entity Process ........................................................................... 42

Table 8: summary of workflow languages, open source BPM and workflow tools and scientific

workflow tools ....................................................................................................................................... 45

Table 9: Preliminary properties of the entity Technical Service ........................................................... 50

Table 10: Classification of change types ............................................................................................... 58

Table 11:description of dependencies .................................................................................................. 65

Table 12: Comparison of dependencies of the five entity types Digital Object, User, Technical Service,

Process and Policy ................................................................................................................................. 84

Table 13: Comparison of change of the five entity types Digital Object, User, Technical Service,

Process and Policy ................................................................................................................................. 86

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 7 / 93

Table 14: Comparison of lifecycles of the five entity types Digital Object, User, Technical Service,

Process and Policy ................................................................................................................................. 88

Table 15: Stakeholder roles from WP2 science and art and media ...................................................... 93

Glossary

Abbreviation / Acronym Meaning

ArchiMate Open enterprise architecture modelling language from the Open Group. Contains several notations to model enterprise architecture, see also enterprise modelling.

ARIS Architecture of Integrated Information Systems is an approach for enterprise modelling.

BPEL Business Process Execution Language is a XML based format for describing business process via web services. See also YAWL.

BPM Business Process Management is an approach for consistently improving processes and business activities of an organisation.

BPMN Business Process Model and Notation is a graphical notation of a business process.

BPMS A Business Process Management System is a software that allows to execute a business process. It can contain several components, workflow system, dashboards, reporting and other tools.

DCC Digital Curation Centre, an institution located in the UK.

DDI Data Documentation Initiative is an approach to create an international XML standard for describing data from the social, behavioural, and economic sciences

DO Digital Object, see chapter 4.3.1Digital Object.

Enterprise Architecture (EA) modelling

Enterprise modelling helps to model goals, processes, structure, business goals and other things of an entire organisation. Various textual and graphical representations may be included.

EPC Event driven process chain provides a graphical way to model business processes.

ICAM Integrated Computer Aided Manufactoring

IDEF0 ICAM Definition Language 0 allows to model (business process) decisions and activities from systems.

LTA Long term archiving is a disciple where (electronic) artefacts should be

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 8 / 93

kept for a future use without any

OAI-ORE The Open Archives Initiative Object Reuse and Exchange format is a specification for exchange of aggregated distributed resources with multiple entities like text, image, video and data.

OCL The object Constraint language is a textual language for describing rules for UML models.

PDL The Policy Description language is a formal language to describe policies in the event-condition-action form.

POL Policy, see chapter 4.3.2 Policy.

PS Process; a process is a series of coupled activities that can take place in any domain (technical, biological, chemical, etc.).

RIF The Rule Interchange format is an open standard to exchange rules between different rule engine vendors.

RIM Records Information Management is about managing and administration of all relevant records of your organisation through the records lifecycle.

RM Records Management, see RIM.

RPC A Remote Procedure Call allows executing functions or routines of computer programs over computer networks. There are a lot of RPC standards.

SAP SAP is a German software company mainly focused on enterprise software.

TS Technical Service, see chapter 0.

UML Unified Modelling Language

US User stories, short form of a use case (in the form of I want to do x to get result y).

US User, see chapter Error! Reference source not found..

WF Workflow; a workflow is a set of connected activities that is executed within an organisation. It is a formal description of work carried on by a person or a group of persons.

XACML With eXtensible Access Control Markup Language it is possible to express access control policies with XML.

YAWL Yet Another Workflow Language is a XML based language that describes executable workflows. See also BPEL.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 9 / 93

1. Executive Summary

This document introduces the concept of a digital ecosystem as an approach to conceptualize digital preservation. In comparison to other approaches the digital ecosystem concept explicitly tries to capture dependencies involving non-technical aspects, in addition to technical ones. In the current version the digital ecosystem is considered to consist of entities which are subject to different kinds of change and which are connected by different kinds of dependencies. By explicitly modelling the digital ecosystem as a dependency graph the preservation of key features can be planned, managed and supported by tools. The term ecosystem was chosen to reflect the an analogy with biological systems, where a complex set of relationships exists between living resources and their environment and where if one resource is added, removed or changes, it can potentially have an impact on the other resources.

In this initial report the entities considered as part of a digital ecosystem are the Users (or User Communities), the Digital Objects, the Policies, the Processes, and the Technical Services/Systems. Each type of entity has specific properties and dependencies and is subject to specific kind of change events and lifecycles. Based on an analysis of these entities some more general characteristics of change and dependencies which need to be modelled can be described. A formal model can be created based on these entities and dependency relations by following a step by step procedure.

Later iterations and extensions of this report will refine the modelling process, the entity and dependency descriptions and define how this high-level digital ecosystem model can be translated and implemented with the lower-level linked resource model (LRM) developed in PERICLES. The PERICLES tools will help in creating, managing and analysing digital ecosystem models. Also out of scope for this initial report is a specification of how the semantic evolution which can affect each type of entity will be modelled.

2. Introduction & Rationale

2.1. Context of this Deliverable Production This deliverable is the initial version of the PERICLES model to manage preservation digital ecosystems and semantic evolution and summarizes the work of task T5.1. It describes what a "digital ecosystem" in the context of PERICLES is, what the basic components of a digital ecosystem are and how they relate to each other. It defines from a high-level view how a digital ecosystem should be modelled and is complementary with the activities of other work packages (WP) and other tasks of WP5.

Relation to other Work Package 5 tasks 2.1.1.

Figure 1: Relation between WP5 tasks

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 10 / 93

T5.1 Enhance lifecycle and preservation models to manage preservation ecosystems and semantic evolution

This task produces not only this initial version of the PERICLES WP5 model. It is also an on-going task that will include later extension of the initial model for semantic change of entities.

T5.2 Develop registries and tools for preservation ecosystem management

The aim of the registries and tools for preservation ecosystem management is to store formalized models of an ecosystem and to support the analysis and the execution by a preservation system. A focus of the registry will be policies and process definitions.

T5.3 Develop processes for preservation ecosystems

Here a system architecture for a crucial part of the digital ecosystem is developed: the chain of policies and their implementing processes/workflows, which are implemented by specific services. Part of this task is to provide the ability to execute the processes. Additionally generic prototype processes for digital preservation will be designed which will populate the default version of the registry.

T5.4 Develop quality assurance methods for preservation ecosystem management and semantic evolution

The aim of this task is to extend the primarily static analysis and view of the ecosystem to include also methods for quality assurance and change management in ecosystems. A special focus will be on policies and processes including conflict detection, user and semantic change.

T5.5 Support functionality for appraisal processes

Model appraisal processes and approaches to support decisions for appraisal processes. In addition prototype tools will be developed to capture, extract and embed appraisal information from and into digital objects.

Relation to other Work Packages from WP5 view 2.1.2.

WP2 will provide scenarios for which a digital ecosystem view will be exemplarily modelled. This will include dependencies, changes and the derivation of processes from policies. Scenarios from the Art Media case studies will be modelled as workflow and lifecycle models. Later quality assurance tasks and potential solutions will be identified from case studies.

WP3 develops the Linked Resource Model (LRM) as a basic ontology for modelling dependencies between abstract resources. The digital ecosystem model which is initially described in this document on the other hand aims to provide the tools to describe specific dependencies in a digital ecosystem from a preservation perspective. The aim of the WP5 model is to be consistent with the WP3 approach so that the LRM can be used to express the digital ecosystem model and its evolution and change management.

WP4: WP5 tries to describe the digital ecosystem as a whole with all entities and dependencies which are relevant. WP4 takes in many aspects the opposite viewpoint. It focuses on an individual digital object and tries to capture its relevant dependencies (the significant environment) with tools. In very simplistic scenarios and under perfect conditions both approaches should basically lead to the same information. In practice it is highly unlikely that all kinds of information relevant for using the digital objects from a long-term perspective can be captured. On the other hand it is impractical to model a digital ecosystem with the level of detail that an automatic software tool can capture. But it will be investigated whether the WP4 tools for capturing information about the significant environment of a digital object can be used to produce part of the digital ecosystem model.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 11 / 93

2.2. What to expect from this Document This document gives a guideline on how to model a digital ecosystem as a dependency graph. It shows which entities are included in a digital ecosystem, what characteristics they have and which parameters need to be captured to model them.

The resulting graph models the digital ecosystem from the activities perspective. Activities

perspective means that the modelling starts from typical actions of the user communities they

perform on the system. The dependency graph shows what needs to be done inside the digital

ecosystem to perform the activities.

2.3. Document Structure This document will first introduce some basic scenarios and user stories in order to illustrate the

value and possible applications of a digital ecosystem model (chapter 3). In the next chapter (chapter

4) a detailed description of the model is provided. The first section (section 4.1) of the chapter gives

a description of the general digital ecosystem perspective. It explains why this term was chosen and

what the authors see as key difference to other approaches. Each of the core entities of a digital

ecosystem model – the vertices of a dependency graph – is described in section 4.2 and 4.3 with its

properties and typical dependencies and change events. Sections 4.4 and 4.5 then describes the

dependency types and change events which are closely related and form the edges of a dependency

graph. How these basic components of a digital ecosystem model can be used to actually create a

digital ecosystem is explained in chapter 5 and illustrated with an example in chapter 6. The

document concludes with a summary and outlook on how this basic foundation will be extended,

applied and supported by tools in the course of the PERICLES project. Annex A contains the entity

descriptions and change and dependency types in tabular form.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 12 / 93

3. Scenarios for a digital ecosystem model

To illustrate the value and possible applications of a digital ecosystem model this section collects

short user stories in a table. This format was chosen to make it easy to refer to the functionalities

and benefits. User stories are common for agile software development and they are used to briefly

express what the system should do for the user [1].Some of these user stories, but not all, are based

on the Media and Science case studies in the deliverable D2.3.1. Other user stories have been simply

invented since their main use in this document is not to specify an implementation, but to illustrate

the potential use.

The tables consist of five columns. User Story ID is a unique number for each user story which can be

used as reference. The role column refers to the digital ecosystem role from annex B. Annex B is a

summary of the different stakeholder roles from the Media and Science case studies. The columns “I

want to (do something)” and “So that (benefit)” build a short sentence from the role perspective to

express which activity the role wants to perform to get a value for the user. Reference to source

refers to the deliverable D2.3.1 if a user story corresponds to one of

Graph analysis

This table collects all user stories that make reasoning of a digital ecosystem graph. If the ecosystem

is modelled according to section 5 it is possible to make statements according to these user stories.

User Story ID

As a (role) I want to (do something) So that (benefit) Reference to source

US-1 Data Manager

Query a model to get an overview of external services that are being used.

I can make an estimation of potential problems if these services fail or are decommissioned.

partly UR-CO-DAT-09, UR-AM-SBA-22

US-2 Data Manager

Know what could happen if a certain dependency breaks.

I can evaluate the impact to my digital ecosystem, user services, repository or repository content.

US-3 Data Manager

Evaluate to which extent our institution fulfils the expectations of a changed or new user community

I know whether and how to change the policies to adapt to the new needs of the community.

UR-SC-POL-19

US-4 Data Manager

Analyse our digital ecosystem for fragile dependencies or single-point of failures

I can plan the further development of our infrastructure and services to include more robust or additional fall back solutions

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 13 / 93

User Story ID

As a (role) I want to (do something) So that (benefit) Reference to source

US-5 Data Manager

Get a sorted list of critical services and processes that are important for our users based on the weights of the dependencies

To create an emergency plan on how to restore a service on failure and the reaction time on a failure (similar to service level agreements)

US-6 All Query a registry which keeps track of the previous dependencies in our digital ecosystem, see also US-7-17 for application of this general functionality.

I can interpret old decisions and old data

UR-CO-DAT-07, UR-SC-PRO-29, UR-SC-PRO-33

US-7 SOLAR Scientist

Know which data was processed with algorithm x.

Reprocess this data when algorithm is faulty or is to replaced.

UR-SC-DAT-47

US-8 Data Consumer, Data Manager, Data Owner

Understand how an artwork has been presented in the past

I can compare the current presented version were several changes have been made over the time and make a decision if I can still trust the authenticity of the artwork

UR-AM-SBA-18, UR-AM-SBA-10, UR-AM-SBA-14

US-9 Conservators, Curator

Document the changes and conservation decisions applied to a digital artwork.

The conservation process is transparent for all current and future staff and also for the visitors of the exhibition.

UR-AM-SBA-15

US-10 Mission operator, SOLAR scientists

Know which version of the mission database was used to post process the raw data

I can interpret the science data correctly.

UR-SC-DAT-49, UR-SC-PRO-35

US-11 Mission operator

Find out what software upgrades have been made on the ISS

I want to understand if science data has been corrupted by changes to the operational environment.

partly UR-SC-PRO-19

US-12 Mission operator

Know if there was a change to the flight rules or

I want to know if the recommended approach

UR-SC-PRO-36

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 14 / 93

User Story ID

As a (role) I want to (do something) So that (benefit) Reference to source

payload regulations to resolving an anomaly can still be applied.

US-13 Data Manager

Evaluate how the change or failure in a workflow affects the implementation of our policies

I know how critical the change or failure is

US-14 Data Manager

Receive a formal dependency model together with data from external parties

I can understand whether our infrastructure is compatible (e.g. policies, processes, technical services) if combine and exchange data with an external party

US-15 Data manager

Figure out if the implicit knowledge has been captured and written down.

To give access to the data for another user community.

partly UR-SC-PRO-40

US-16 SOLAR Scientist, Scientists (same domain), Scientist (other domain)

Understand and re-use data from existing experiments.

I can use this for my own experiments or compare the old and new data. Or I can apply new algorithms over the old data and compare the results.

UR-SC-POL 22

Table 1: usage cases based on graph analysis

Basic graph manipulation This category contains user stories that manipulate the graph, but do not have a significant impact to

other entities, for example add a new process that is not related with existing processes.

User Story ID

As a (role) I want to (do something) So that (benefit) Reference to source

US-17 Data Owner Modify the system to change operational roles and responsibilities

Reflect organisational changes of the institution to the system and processes

Science

US-18 Data Manager

Update the digital ecosystem and add a new

So that they can reuse the existing data.

Partly UR-SC-PRO-40

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 15 / 93

user community.

US-19 Mission operator

Update the system to take account of new processes (e.g. approval workflows)

To allow a further development of the system

WP2 Science

US-20 Data Manager

Add a policy to lock certain files or set a closure period

To respect privacy of persons and conform to data protection regulations

UR-AM-BDA-06, UR-AM-BDA-07

Table 2: List of basic graph manipulations that do not have a significant impact on the digital ecosystem

Tools

The following table contains a list of possible manipulations on the digital ecosystem if there is a

(software) tool support. These manipulations go beyond graph manipulation and basic graph

reasoning.

User Story ID

As a (role) I want to (do something) So that (benefit) Reference to source

US-21 Data Manager

Be made aware of relevant changes in a central dashboard.

We can propose strategies to adopt to the changes.

US-22 Data Manager

Monitor my digital ecosystem and get an alert if a certain threshold is reached, i.e. get an alert if the execution of policies or processes fail.

We have a transparent view on the digital ecosystem and will be notified before any damage may occur.

-

US-23 Data Manager

Reconfigure a process to use another file format

I can retire the legacy software and use a new file format.

-

US-24 Data Manager

Update policies to reflect new semantics.

New terminology and ontology can be introduced in the system.

US-25 Data Manager

Update policies to change the way digital files are saved by the systems.

I can reorganize my storage and backup systems.

-

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 16 / 93

User Story ID

As a (role) I want to (do something) So that (benefit) Reference to source

US-26 Data Manager

Browse a public directory of policies and the realisation of the policy as process

It is easier for me follow common practise and I can be sure that my adapted policies are well proven.

Partly UR-SC-PRO-06

US-27 Data Manager

Use the digital ecosystem model (graph) and extract which data needs to be kept

Automatically generate a configuration for the PET tool

Partly UR-SC-POL-08

US-28 Data Owner, Data Manager

Be alerted if significant properties of an updated version (e.g. file format, presentation form) are not satisfied

I am notified on possible problems and can update the generation process and re-create the derived format.

UR-AM-SBA-28

Table 3: Possible manipulations with tools working on the digital ecosystem model

Abstract summary of common user stories

This summary groups similar user stories from the tables above and makes them more generic. The

last column “category” assigns the stories to the categories graph analyses, basic graph

manipulations and tools.

User Story ID

As a (role)

I want to (do something)

So that (benefit) Source Category

US-C-1 Data Manager

Analyse the current digital ecosystem model to get an overview of my digital ecosystem.

I can get a list of external services, get a ranking of important activities inside my digital ecosystem based on the weight, make an estimation of potential problems, can identify fragile entities with short lifecycles and make a plan for future development of my digital ecosystem.

US-1 US-4 US-5

Graph analysis

US-C-2 Data Manager

Evaluate the effects of change

To see what happens to my digital ecosystem if there is a change to the entities or dependencies.

US-2 US-13

Graph analysis

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 17 / 93

User Story ID

As a (role)

I want to (do something)

So that (benefit) Source Category

In particular what are the effects on the services if policies change? What happens if user communities change or have new requirements?

US-C-3 All Compare parts of the digital ecosystem against the change log of the digital ecosystem

Get information and understand how something has been done in the past.

US-6 US-7 US-8 US-9 US-10 US-11 US-12

Graph analysis

US-SC-4 Data Manager, Data Owner

See since when something is supported

To provef since when our digital ecosystem is compliant with something (e.g. legal requirement)

US-C-3 Graph analysis

US-C-5 Data Manager

Compare digital ecosystem graphs.

I can see if the workflows and organisation of our institution is compatible with other institutions.

US-14 US-3 US-15

Graph analysis

US-C-6 All Get the history of the digital ecosystem

To see how something has been done, represented in the past and since which date some procedure has changes.

US-16 US-6 US-7 US-8 US-9 US-10 US-11 US-12

Graph analysis

US-C-7 Data Owner, Data Manager

Update the digital ecosystem model.

Take account of formal organisational changes and enable further development of the digital ecosystem.

US-18 US-17 US-19 US-20

Basic graph manipulation

US-C-8 Data Owner, Data Manager

Have a central place (dashboard) to monitor the digital ecosystem

I can be aware and notified of relevant changes and possible problems.

US-21 US-22 US-28

Tools

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 18 / 93

User Story ID

As a (role)

I want to (do something)

So that (benefit) Source Category

US-C-9 Data Owner, Data Manager

Modify digital ecosystem to implement change.

To change the behaviour of the entities, e.g. update, change, delete entities, change dependencies.

US-23 US-24 US-25

Tools

US-C-10 Data Manager

Use digital ecosystem model(s) as information source.

To have common a directory of common practice and proofed policies and processes.

US-26

Tools

US-C-11 Data Owner, Data Manager

Use the digital ecosystem or parts of it as source for automatic configuration.

The tools are automatically configured (e.g. required metadata fields, retention or lock period, …)

US-27 Tools

Table 4: Abstract usage cases as summary from the tables before

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 19 / 93

4. Detailed description of the model

In this chapter the general concept of a digital ecosystem is introduced and the five different entity

types and their properties are presented.

4.1. Digital ecosystem PERICLES develops an approach which tries to extend the traditional focus of digital preservation practice on technological issues to include dependencies between all kind of entities and change events of a variety of types. The central concept of this approach is "ecosystem".

Existing approaches to preservation typically focus only on the technical environment necessary for using digital objects, and in particular addressing the issue of format change. Thus the main task of digital preservation is then often seen as managing this type of change to prevent technical obsolescence of the digital objects. In contrast PERICLES will investigate how changes in any element of the environment of a digital object affect its usefulness, understanding and how such change can be managed.

A consequence of approaches focussing on technical change is that a long-term archive is seen as the central component for digital preservation. While the key documents and standards do acknowledge that the task of such archives is not only technical, but also organisational, the authors of this report have the impression that institutions often overemphasize the technical aspects. The implicit and false assumption of many institutions seems to be that an archive as a technical system can solve their preservation challenge.

PERICLES tries to address more variability than just changes to the technical environment. Without claiming completeness PERICLES tries to address:

Change in user communities. This includes e.g. change in expectations, requirements, abilities and background knowledge which can all result in a digital object is no longer being as useful as it could be. These kinds of changes may occur within the same user community or group of people. But, an institution may also be confronted with these kinds of changes because they have to deal with new user communities while previous communities may vanish. Institutions also have to cater for multiple user communities.

Changes in the institution. Institutions can change in a variety of ways, e.g. they can merge or objects can be exchanged and managed by other institutions. The aims which are reflected in the policies of an institution may change and this may immediately have consequences for the retention of digital objects. The processes and workflows may change because of internal or external requirements or new technologies become available or old technologies are no longer available.

Technology change. To access and make use of a digital object it is necessary to use whole technology stacks and infrastructures. These consist not only of the application software for the digital object, but include also e.g. hardware, operating systems, interfaces, networks and remote web services. These technology stacks and infrastructures can change in an uncontrolled (e.g. malfunction) or controlled (e.g. updates) manner. Often technology is replaced not because of malfunction, but because new technology is more efficient and satisfies new requirements. Technical obsolescence is usually not absolute in the sense that the necessary technology stack is not available, but relative to the requirements of users and institutions who do not want to use that technology stack anymore.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 20 / 93

Relevant change may also occur in the larger social or cultural context common to the institution and the user communities. Examples are laws, disciplines or cultural norms which require changes for the institutions and users.

Change does not have to be "global" and "discrete". Often our initial idea is that a change in document format has to impact all content and user communities. Such change would be discrete in the sense that it occurs at a specific point in time. But in reality change may be slow, local and gradual evolution. In particular, semantic and community change may be continuous rather than discrete.

The changes listed above do not need to be a threat to the usefulness of digital objects. It is primarily unmanaged change that may result in digital objects that are not usable or not as useful as they could be. If change is managed appropriately and e.g. a new generation of access services ensures backward compatibility then digital objects might actually become more useful than they were.

In PERICLES a "digital ecosystem" is interpreted as consisting of different types of entities with different dependency relations between them. The entity types that PERICLES considers at the moment are:

Users (which interact with institutions and their services to access and use digital objects and which have certain expectations and background knowledge)

Institutions (which offer users services and access to digital objects in accordance with their institutional aims)

Digital Objects (which are based on certain technology and background knowledge)

Policies (which express aims and general approaches of an institution)

Processes/Workflows (one or more activities which use services to fulfil business aims)

Technical Systems/Services (which operate on digital objects and interact with other services)

In PERICLES "digital ecosystem" should be interpreted as a concept with the purpose for analysing and modelling the ability of infrastructure to maintain the usefulness of digital objects. We define a digital ecosystem in the following way: A digital ecosystem consists of all the entities and relations influencing or necessary for a successful use at a later point in time. Thus it is not meant to be applied to the whole internet and the set of all digital technologies nor just to individual systems and technologies. The analytical value of the concept can probably be utilized best when it is applied on the level of a single or a limited number of institutions.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 21 / 93

Figure 2: Preliminary visualisation of digital ecosystem entities with exemplary interactions (grey arrows). Red arrows

symbolize examples of change of an entity which can affect dependencies and interactions with other entities.

There are a several reasons why the term "digital ecosystem" is used instead of the related terms "system", "environment" or "infrastructure":

"Digital ecosystem" emphasises the fact that we are talking about a diverse set of interacting entities which have not been planned in advance such as for example an integrated system. Even if an institution might initially implements a well-planned infrastructure, usually change gradually accumulates and people are confronted with a situation or infrastructure which is best described as "grown". Grown in the sense that they have to deal with entities which developed at different times in which the environment might have looked different, e.g. legacy systems.

"Digital ecosystem" is closely related to the notion of evolution, change and development. Unintended and uncontrolled change occurs often. Even just for technology this may range from technical progress and obsolescence to technical system failures.

Related to unintended and uncontrolled change is the observation that a digital ecosystem is not under the full control or authority of just a single actor or community. Individual institutions usually have limited influence on user communities, partner institutions, technology providers, legislation or society in general which all affect the conditions of operation.

A digital ecosystem also has a potential for niches. Sometimes, there are unforeseeable opportunities for certain interactions, roles and functions in a digital ecosystem.

These reasons do of course not force us to use the term "digital ecosystem" and other terms could be used with similar qualifications. But since PERICLES tries to take these aspects serious "digital ecosystem" is more appropriate.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 22 / 93

The qualification "preservation" of the term digital ecosystem does not mean that we only examine digital ecosystems and their entities which are explicitly intended to contribute to digital preservation (like an archive, the designated communities, the policies and processes for digital preservation). This is how for instance the SCAPE project uses the term "preservation ecosystem", see section 0. Instead we are investigating how a digital ecosystem should be managed so that the usefulness of the digital objects can be maintained. In fact the assumption in PERICLES is that a digital ecosystem does not necessarily have to contain a dedicated preservation system or a digital archive at all to maintain the usefulness of digital objects in the long-term (This has been succinctly put as „Preservation is not a place“[2] or as „Preservation as capability [3]of organisations and ecosystems). The term "preservation ecosystem" should be read as synonym for "preservation in digital ecosystems" or "longevity of digital ecosystems" in order to emphasise the management and long-term perspective for digital ecosystems.

The working assumption is that it should be possible to develop a formal model of a digital ecosystem. Such a formal model would include the dependencies and interactions between all entities as a dependency graph and would allow simulation of the consequences of changes of different entities. Based on certain properties (e.g. as a network in graph theory like connectedness or absence of single points of failure in the formal representation) we should be able to evaluate digital ecosystems and how suitable they are from a long-term perspective. For example, an institution which offers access with several different services should be more capable to deal with changing user expectation regarding access.

State of the art 4.1.1.

This section tries to give an overview of approaches to digital preservation and covers the topics

trustworthy repository, format obsolescence, adding preservation to existing systems, infrastructure

independence, graph based preservation and other ideas..

Trustworthy repository

There are several projects and standards which address digital preservation by defining the

properties a trustworthy repository should satisfy:

ISO 14721 aka OAIS (Open Archival Information System) is a reference model that describes the

functionalities and information types necessary for a dedicated archive in an abstract way [4].

CASPAR was a project from people behind the OAIS standard which implemented OAIS

functionalities.1

ISO 16363 (previously known as TRAC, Trustworthy Repositories Audit & Certification: Criteria

and Checklist), available from CCSDS [4]. It follows on from the OAIS reference model for

archives, divided into three key sections: Organisational Infrastructure, Digital Object

Management and Infrastructure, and Security Risk Management. Each section provides a number

of metrics against which a repository will be evaluated with examples of how a repository can

demonstrate it meets the requirement.

Data Seal of Approval2 is a weaker set of criteria for archives but provides a low barrier to entry

for organisations.

1 See CASPAR: Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval,

http://www.casparpreserves.eu

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 23 / 93

DIN 31644 is the Information and Documentation - Criteria for Trustworthy Digital Archives,

published by the German Institute for Standardization in 2012.3 Based on nestor Catalogue of

Criteria for Trusted Digital Repositories, Version 2.4

The DASISH project has proposed a five-level maturity model for trustworthiness based on

existing trust-related standards and criteria.5

Drambora analysed digital repositories as the means for digital preservation but applied a risk

management approach.

Format obsolescence

Many projects tried to address digital preservation as a file format obsolescence issue. According to

this viewpoint, the main issue for digital preservation is that the software necessary for using a

digital object if a certain file format becomes obsolete. As a consequence migration (changing the file

format) and emulation (providing an encapsulated environment for old software) are among the

most discussed and investigated topics in digital preservation. There has been some discussion

whether file format obsolescence actually is an issue, namely by David Rosenthal, one of the main

proponents of LOCKSS [5][6]. Unfortunately the discussion about obsolescence usually forgets about

the user requirements and the relativeness of obsolescence to user requirements.

Preservation as capability

Preservation as capability is an approach which emphasizes that preservation can be added to

already existing organisations or systems and argues against the often implied necessity of a

dedicated system for preservation.

Christoph Becker et al, Modelling Digital Preservation Capabilities in Enterprise Architecture [3]

Antunes et al, Assessing Digital Preservation Capabilities Using a Checklist Assessment Method

[7]

The SHAMAN reference architecture already contained some of these ideas, although most of

them are also present in other sources because the same persons were involved [8].

BenchmarkDP deals with preservation as a capability of systems and processes6. "In the

dimension of systems, process and organisations, existing models are not sufficiently taking into

account the concerns of digital longevity and information preservation over time. [...] For systems

and processes, we will create and evaluate a Capability Maturity Model for DP that enables

systematic process improvement and governance of ICT systems with respect to longevity over

time."[9] 2 See Data Seal of Approval: Towards sustainable and trusted data repositories,

http://www.datasealofapproval.org/. 3 Deutsches Institut für Normung E.V. (2012): DIN 31644: Information and documentation – Criteria for

trustworthy digital archives, http://www.techstreet.com/products/1827799. 4 Nestor Working Group (2009): Catalogue of Criteria for Trusted Digital Repositories - Version 2, http://nbn-

resolving.de/urn:nbn:de:0008-2010030806. 5 DASISH: Data Service Infrastructure for the Social Science and

Humanities,http://dasish.eu/publications/projectreports/D4.1_-_Roadmap_for_Preservation_and_Curation_in_the_SSH.pdf/ 6 See http://benchmark-dp.org/publications/.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 24 / 93

Preservation as infrastructure independence

Preservation as infrastructure independence defines preservation as keeping institutional aims and

processes stable across different technical infrastructures. The main proponent of this approach is

Reagan Moore who works on implementing this with the iRODS system.7

There is the proverb that "long-term preservation is interoperability with the future" (unknown

source). Especially for investigating change in a heterogeneous ecosystem it might be useful to

interpret preservation as an interoperability task. Relevant documents might be:

Digital Library interoperability Cookbook8

For research data the e-IRG has mentioned this task in their report9

Managing preservation with graphs

When we model the relations between many different entities in a preservation ecosystem the result

will be a graph or network. There are a few works on managing preservation with networks and

graphs. Two examples:

Conway et al (2012): Managing Risks in the Preservation of Research Data with Preservation

Networks [10].

Related to managing preservation with networks and graphs is dependency management,

described in Tzitzikas (2007): Dependency Management for the Preservation of Digital

Information [11].

Other ideas

There are some new vague ideas like "Self-Preserving Objects" which have not been described in

greater detail yet but can be seen as related to the PERICLES idea of "Preservation by design".10 The

main idea seems to be to have objects which are as autonomous, autarchic and self-describing as

possible.

7 See Moore, R. (2008): Towards a Theory of Digital Preservation, Vol.8, No.1, p.63-75,

http://dx.doi.org/10.2218/ijdc.v3i1.42. 8 See Digital Library Interoperability, Best Practice and Modelling Foundations:D3.4 Digital Library Technology

and Methodology Cookbook,http://www.dlorg.eu/index.php/outcomes and http://www.dlorg.eu/uploads/Booklets/booklet21x21_cookbook.pdf. 9 See e-IRG Report on Data Management : Data Management Task Force, November 2009, chapter 3,

http://ec.europa.eu/research/infrastructures/pdf/esfri/publications/esfri_e_irg_report_data_management_december_2009_en.pdf. 10

See The Future of the Past – Shaping new visions for EU-research in digital preservation,2011, http://cordis.europa.eu/fp7/ict/telearn-digicult/future-of-the-past_en.pdf.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 25 / 93

4.2. Dependencies reference model A dependency graph contains the five generic entity types Digital Object, Technical Service, Policy, Process and User. Such a dependency graph consists of vertices and edges. The edges that connect these types are an expression of complex dependencies between the entities. External changes and modifications between dependencies may influence the graph. The following diagram depicts generic dependencies among the entity types:

Figure 3: Dependencies reference model

Of course it does not make much sense to build a graph with the generic entity types. It is only useful

with instances from the entity types. For example there is a Technical Service “storage system” that

exposes a storage service. To support this description, guidelines and lifecycle model of the entity

types will be given in the next sub chapter.

4.3. Entities of the digital ecosystem This subsection describes the five different types of entity: digital object, policy, process, technical service and users. These entity types will later be used for the dependency graph that connects these objects with each other. For a tabular and synoptic description of all entities, their lifecycles and dependencies see Annex B.

Digital Object 4.3.1.

There are many different definitions and approaches to define, describe and model digital objects,

for example:

1. A digital object is “a data structure whose principal components are digital material, or data,

plus a unique identifier for this material.”[12]

2. “Digital objects (or digital materials) refer to any item that is available digitally.”[13]

3. “An object composed of a set of bit sequences.” [4]

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 26 / 93

Definition 4.3.1.1.

The definitions above are generally compatible. It would be expected that any item available digitally

(2) is composed of a set of bit sequences (3). It is conceivable that a digital object could be defined in

terms of its content (with this definition possibly associated with a unique identifier) and the bit

sequences could be created or recreated on demand, but this would be very unusual. The definition

(1) imposes the additional requirement that the object have a unique identifier, but since it is

possible to have objects that satisfy (2) and (3) that do not have this property, we do not consider

this to be a defining feature.

We therefore define a digital object following (2) above as “any item that is available digitally”.

There are several aspects to what a digital object is, and a useful categorisation is provided by

Thibodeau [14]:

A digital object has at least three aspects:

1. The physical object: the physical manifestation of the object (e.g. bits as magnetized areas of

a hard drive).

2. The technical object: processable by machines/computers (e.g. a set of files) ('object' in

PREMIS).

3. The conceptual object: understandable/usable by humans ('intellectual entity' in PREMIS). It

may include 'significant properties'. These are the properties which should be preserved

(content, context, structure, behaviour, appearance)."

Lifecycle of a Digital Object 4.3.1.2.

Generally the lifecycle follows the principles of the scheme depicted on annex A. The main impact

that lead to lifecycle changes on digital objects are iterations on lifecycles of other dependent

entities. Whenever another entity is modified on which a digital object is dependent, it is quite likely

that the digital object itself needs to be altered as well.

A new iteration of the lifecycle of a Digital Object happens when semantics or knowledge of a user

changes in a way that a Digital Object or the view of a Digital Object needs to be changed. Or a new

iteration of a Digital Object is required to reflect changes on other dependent entities.

Dependencies 4.3.1.3.

Digital objects are used and processed by all entity types. Digital object can be regarded as fragile as

they typically rely on other the existence and stability of entities of other types to be accessible and

usable at all. For example, in most cases a technical service is required to access and render a digital

object.

For an overview of the dependencies see Annex B.

Changes 4.3.1.4.

Because digital object frequently have many dependencies, both for creation and reuse, they are

highly susceptible to. However we can identify two main categories of change. One is induced by

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 27 / 93

changes on other dependent entity types. For example a metadata generation process can create

additional and new attributes which are embedded in the digital object and therefore changes it.

The other origins of change are changes to the semantics or terminology resulting from changes to

background knowledge of a user community. If the community background knowledge changes, a

user from that community may no longer be able to understand correctly the content of a digital

object. For example new scientific fields lead to new interpretation of information and the Digital

Object needs to be modified to reflect the new knowledge; naming convention of several attributes

have changed or some attributes where merged. This leads to lifecycle changes of a Digital Object.

For more examples of changes on digital objects see Annex B.

Properties of a Digital Object 4.3.1.5.

It is not necessary to have a full description of a digital object for the dependency graph. An object description can be done in any format suitable for the purpose like e.g. PREMIS or METS in preservation contexts.

Name Description Example(s)

Name Name to identify the object Telemetry-Data

Version The version number allows assigning each

entity to a certain state of the graph. This is

necessary for tracking the graph history.

0.1

Link to full object

description Reference to a full object description in a

suitable format.

METS model #2

Change

parameters

Expressed as ordered pairs of (change type,

volatility). This measures how an object is

likely to change and how likely this is to

occur. For example the background

knowledge of a user community which can

be classified according to a certain scheme

(e.g. a value between 0 and 1).

(Semantic knowledge, 0.4),

(Change of role, 0.1)

Sensitivity

parameters

Expressed as ordered pairs of (change type,

sensitivity expressed as a value between 0

and 1). This measures to what changes (i.e.

through dependency on another entity) an

object is sensitive and the likelihood of this

change occurring.

Technical changes: 0.4

Semantic changes: 0.3

Table 5: Preliminary properties of the entity Digital Object

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 28 / 93

State of the art 4.3.1.6.

Important criteria for digital objects are authenticity and significant properties. These topics are

covered in the common guidelines and principles section below as well as aggregation and exchange

formats and persistent identifier to uniquely identify digital objects. There is a reference to common

graphical notations of objects and common metadata standards.

Common guidelines and principles

Authenticity

In the OAIS model, authenticity of a digital object is defined as "the degree to which a person (or

system) regards an object as what it is purported to be. Authenticity is judged on the basis of

evidence." [4, pp. 1–9]

Seadle (2013) defines the authenticity of a digital object as “in a purely technical sense, that a

document's integrity has been checked using mathematical algorithms against other copies on

independently managed servers and that provenance records show that the document has a clearly

established succession from a clearly defined original.” [15]

Significant Properties

There are many differing definitions of significant properties in the literature, for example:

● Significant properties also referred to as “significant characteristics” or “essences”, are

essential attributes of a digital object which affect its appearance, behaviour, quality and

usability. They can be grouped into categories such as content, context (metadata),

appearance (e.g. layout, colour), behaviour (e.g. interaction, functionality) and structure (e.g.

pagination, sections). Significant properties must be preserved over time for the digital

object to remain accessible and meaningful [16].

● Based on Wilson’s definition: The characteristics of digital objects that must be preserved

over time in order to ensure the continued accessibility, usability, and meaning of the

objects, and their capacity to be accepted as evidence of what they purport to record [17].

● The OAIS has introduced the related concept of "Transformational Information Property: an

Information Property the preservation of the value of which is regarded as being necessary

but not sufficient to verify that any Non-Reversible Transformation has adequately preserved

information content."[4, pp. 1–16]

Aggregation and exchange formats

A number of formats exist to enable aggregation and exchange of digital objects. A useful survey of

requirements for preservation packaging formats, and an analysis of which of the commonly used

formats meet those requirements is included in the paper of E. Zierau: “A survey and comparison of

packaging formats for digital preservation”[18].

● The Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 29 / 93

the description and exchange of aggregations of Web resources11. This is based on the use of

Resource Maps. URIs can be used to identify an aggregation as a single object.

● The WARC (Web ARChive) format (ISO 28500:2009) specifies a method for combining

multiple digital resources into an aggregate archival file together with related information12.

● AXF (Archive eXchange Format): an AXF object contains the payload accompanied by

structured or unstructured metadata, checksum and provenance information, full indexing

structures in an encapsulated package13.

● Data Catalog Vocabulary (DCAT) (W3C Working Draft 12 March 201314) is an RDF vocabulary

designed to facilitate interoperability between data catalogues published on the Web.

Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.

Persistent Identifier

A persistent identifier is an artificial property that allows to persistently and uniquely identifying an

object. Different identifier systems and registries exist. For example:

● A Digital Object Identifier (DOI) is a Digital Object Identifier as defined in the DOI System (an

ISO standard15). The DOI system implements the Handle System16 and the indecs

Framework17.

● An Archival Resource Key (ARK) is a Uniform Resource URL that is a multi-purpose identifier

for information objects of any type18.

The structure is [http://NMAH/]ark:/NAAN/Name[Qualifier] where:

NAAN is a unique identifier of the organization that originally named the object

NMAH is an optional and replaceable hostname of an organisation that currently provides a

service for the object

Qualifier is an optional string that extends the base ARK to support access to individual

hierarchical subcomponents of an object, and to variants (versions, languages, formats) of

components.

● A Persistent Uniform Resource Locator (PURL) is a permanent identifier in form of a valid

URL. The Persistent URL is an address on the World Wide Web that causes a redirection to

another Web resource. If a Web resource changes location (and hence URL), a PURL pointing

to it can be updated. A user of a PURL always uses the same Web address, even though the

resource in question may have moved[19].

● Primary key in relational databases. A primary key uniquely identifies a data set from a

11

Standard available under http://www.openarchives.org/ore/ and

http://www.openarchives.org/ore/documents/CompoundObjects-200705.html. 12

WARC standard as draft http://bibnum.bnf.fr/WARC/WARC_ISO_28500_version1_latestdraft.pdf. 13

See AXF. Archive eXchange Format, http://www.openaxf.org/. 14

See W3C Working Draft. Data Catalog Vocabulary(DCAT), 2013, http://www.w3.org/TR/vocab-dcat/. 15

See DOI Handbook, http://www.doi.org/. 16

See Handle System. Unique and Persistent Identifier for Internet Resources, http://www.handle.net/. 17

See CORDIS. http://cordis.europa.eu/econtent/mmrcs/indecs.htm. 18

See https://confluence.ucop.edu/display/Curation/ARK.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 30 / 93

database. The key can be a natural key or a surrogate key[20].

● Universal unique identifier (UUID) is used in software technology to identify objects and files

without a central registry. A UUID is 128 bit long and different variants and versions (1-5)

exist to calculate that number[21].

● Any system that assign a unique number to identify something. Examples are ISBN19 and

European Article Number (EAN)20 which uniquely identify an item and provide a registry to

provide the details for that number.

Graphical or formal notation

In computer science there are different ways to model a digital object. There are general notations

for software architecture and domain specific forms. The most known common notations are UML

diagrams. UML provides a set of different diagram types to model the behaviour, landscape,

relations, information flow of software and objects. The different diagram types are based on

common diagram forms, e.g. flow charts, petri nets and other types.

Other modelling notations are domain specific that have a closed application area. An example is

database modelling. It has the aim to create a data model that is suitable for relational databases.

The graphical notation for relational data models are entity relation diagrams (ER) as a high level

overview of the data entities.

Relevant LTA metadata approaches

There are many approaches and projects for long term archiving which focus on digital objects. The

main strategies to preserve a Digital Object are bit-stream preservation, migration and emulation.

Since PERICLES has its initial focus on modelling a digital ecosystem and its entities, this section

covers metadata standards to describe a Digital Object.

There is a wide range of metadata standards for describing Digital Objects that are relevant to

preservation. Some of the most widely used are:

● MARC (MAchine-Readable Cataloging) standards are a set of digital formats for the

description of items catalogued by libraries by the US Library of Congress in the 1960s. MARC

21, was created in 1999 and is widely used throughout the world. The MARC21 family of

standards now includes formats for authority records, holdings records, classification

schedules, and community information, in addition to the format for bibliographic records.

MARCXML is a standardised XML representation of MARC21 metadata21.

● METS (Metadata Encoding and Transmission Standard) is an XML schema to describe digital

library objects and associated metadata. It has three main sections: descriptive, files and

administrative information. The descriptive and administrative sections are wrappers that

enable METS to be used in conjunction with other schemas (e.g. MODS, PREMIS)22.

19

See https://en.wikipedia.org/wiki/International_Standard_Book_Number. 20

See http://upc-ean-information.com/.) 21

See MARC 21, http://www.loc.gov/marc/. 22

See METS: Library of Congress: Metadata Encoding and Transmission Standard,

http://www.loc.gov/standards/mets.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 31 / 93

● MODS (Metadata Object Description Schema) is derived from MARC. It is often used as an

extension schema to METS (rich description works well with hierarchical METS objects).

MODS is a schema for a bibliographic element set that may be used for a variety of purposes,

and particularly for library applications.

● PREMIS (PREservation Metadata Implementation Strategies) provides a data dictionary for

metadata to support the long-term preservation of digital objects. The PREMIS data model

includes Intellectual Entities, Objects, Rights, Events and Agents. It can be used with METS23

and an OWL ontology version is available24. They also work on "Describing Digital Object

Environments in PREMIS" [22].

● FCM (Fedora Content Model)25 is an RDF-based approach to modelling digital objects by the

Fedora repository[23]. The relationship between FCM and METS is described in Gartner

(2012)[24].

● DIDL (Digital Item Description Language) is part of the MPEG-21 standard aimed at

multimedia applications. The core elements are definition of a Digital Item (a fundamental

unit of distribution and transaction) and users interacting with Digital Items. A Digital Item

may be a combination of resources such as videos, audio tracks or images, metadata, and a

structure for describing the relationships between the resources. See Organisation

Internationale de Normalisation, “ISO/IEC JTC1/SC29/WG11: Coding of Moving Pictures and

Audio”[25]

● MIX (Metadata For Images in XML) can be used standalone or as an extension schema with

METS26

● Dublin Core (DC) is widely used in repositories but has much freedom in interpretation of the

elements27. The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata

elements. Qualified Dublin Core is an ongoing process to develop exemplary terms extending

or refining DCMES.

Conclusion of state of the art

The usual preservation perspective on digital objects is to describe those objects in detail. Initially

PERICLES will focus more on the relation of the digital object to other entities and less on the

description of the digital object. The description of the digital object itself and especially its semantic

description will become more relevant and will be extended when the challenge of semantic change

will be addressed in the second half of the project.

At the moment OAI-ORE and PREMIS in its OWL version seem to be most relevant for PERICLES since

they build on the same RDF approach as the LRM model. The PREMIS work on describing digital

23

See PREMIS Data Dictionary for Preservation Metadata, http://www.loc.gov/standards/premis/premis-

mets.html. 24

See http://id.loc.gov/ontologies/premis.html. 25

see http://fedora-commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html. 26

See MIX. NISO Metadata for Images in XML Schema. Technical Metadata for Digital Still Images Standard,

http://www.loc.gov/standards/mix/. 27

See Dublin Core Metadata Initiative, http://dublincore.org/specifications/.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 32 / 93

objects environments is also highly relevant for the WP4 work on Significant Environment

Information. The concept of significant properties was already used in this WP4 work and will be also

used in the future work on dependency graphs and ecosystem modelling in order to prioritise and

measure relevance.

Policy 4.3.2.

A policy is a general directive describing the aims, principles, and general direction of an institution

with respect to the management of a digital ecosystem. The term policy in this case should not be

confused with the policy definitions from IT which typically concerns access rights and permissions.

Policies can be described in varying degrees of precision from natural language to a formal

specification as rules. They can range from the high level (less specific and an expression of

objectives) to the lower level (more specific, closer to processes and services that implement them).

Low level policies can often be implemented or executed by processes. For instance, the policy that

multiple copies of a file must be kept could be implemented by a data management system.

Policies often contain a specification for the domain to which they apply and will be enforced.

Definition 4.3.2.1.

In the Merriam dictionary[26] the word policy is defined as:

a. definite course or method of action selected from among alternatives and in light of

given conditions to guide and determine present and future decisions

b. a high-level overall plan embracing the general goals and acceptable procedures

especially of a governmental body

Both these definitions focus on what needs to be done, but it is often helpful to separate out a

desired outcome from the specific steps required to achieve it. In the context of PERICLES we

therefore define a policy as follows:

A policy is a plan that defines the desired state inside a digital ecosystem. A policy describes the

'what' (guidelines) and not the 'how' (implementation).

A policy may describe things that need to be done (it could dictate that a specific process is used),

but would not normally go into detail about this process. It cannot be assumed that a policy gives any

information about how it should be applied, enacted or enforced.

Lifecycle 4.3.2.2.

It is to be expected that the lifecycles of a policy is subject to change due to: new requirements to

other entities that result in a change of certain entities and likely causes a change on policies; policies

can get retired according to a schedule. See also Annex B.

Dependencies 4.3.2.3.

Policies are related to a formal expression of the digital ecosystem concept. They constrain the

structure and the interactions between entities. That means that basically each entity is dependent

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 33 / 93

on a policy. They are not only a static expression of a model, but can have an active role as a rule

base for making decisions, especially in processes.

Policies can be hierarchically grouped, so policies may have dependencies on other policies. And

meta-policies (i.e. policies about policies) help to manage the lifecycle of policies, so there are

dependencies as well. See Annex B for a full overview.

Changes 4.3.2.4.

Policies form general rules for the digital ecosystem or act as rule base during process execution. It is

quite likely that a change of behaviour of the digital ecosystem requires a change in policies.

Examples that require modifications to the policies are: process changes, legal changes and

organisational changes. It is important to note that every entity in a connected dependency graph

will be directly or indirectly influenced by a policy. A change to a policy may affect (the lifecycle of)

other entities as well.

If there are major changes on other entity types, it is very likely that this will result in the need for

changes in policies. See Annex B for a comparison.

Properties of a Policy 4.3.2.5.

Name Description Example(s)

Name Name to identify the object XML processing policy #1

Version The version number allows assigning each

entity to a certain state of the graph. This is

necessary for tracking the graph history.

0.1

Scope Entities to which the policy applies. All XML files

Owner Who owns the policy Data manager

Policy/rule The intention of the policy. (i.e. in the

event-condition-action form. Event is what

triggers the execution of the policy,

condition is the logical condition that must

be satisfied to execute the policy, and

action is what the policy does when it is

executed).

If xml file does not conform to

DTD or XSD or does not specify

a scheme at all, then reject the

file, send an error and create

an alert to the dashboard.

Change

parameters

Expressed as ordered pairs of (change type,

volatility). This measures how an object is

likely to change and how likely this is to

occur. For example users are subject to

(Semantic knowledge, 0.6),

(Change of role, 0.1)

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 34 / 93

knowledge change, which can be classified

according to a certain scheme (e.g. a value

between 0 and 1).

Sensitivity

parameters

Expressed as ordered pairs of (change type,

sensitivity expressed as a value between 0

and 1). This measures to what changes (i.e.

through dependency on another entity) an

object is sensitive and the likelihood of this

change occurring.

Regulatory changes: 0.1

Technical changes: 0.5

Organisational changes: 0.8

Table 6: Preliminary properties of the entity policy

State of the art 4.3.2.6.

This chapter introduces policy layers, from high level to low level policies. Then an overview over

different policy languages follows. The focus is generic policy languages and not specific languages

for security and access. The last chapter contains a list of different long term archiving projects that

include or deal with policies in different ways.

Common guidelines and principles

Policies can be subdivided into layers [27, pp. 22–24]. Firstly there are high level policies written in

natural language like “check the file integrity of the files in the preservation system every month”.

Such high level policies can be translated into technically independent abstract models that map the

high level policies to systems and processes by the use of a formal (policy) specification language (see

chapter Error! Reference source not found.). The semantics and relations between policies,

processes and IT systems are defined at this level.

Next, the abstract layer is transformed into a technical specific concrete policy layer that is tailored

to specific systems.

Finally, the concrete policy layer is transformed into granular parts that can be executed by a

technical system.

Figure 4: Policy layers and possible notations for each layer

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 35 / 93

Graphical or formal notation

From a more technical perspective, policies can be expressed in policy languages and using rule-

based approaches.

There are many policy languages. A lot of policy languages are application-domain specific. For

example they are designed to manage access permissions (i.e. Web Services Policy Language WS-

XACML28, MPEG-21 Part 5: Rights Expression Language29). Other types of policy languages are

intended for general system management as the following list shows (except of OCL and XACML).

● The Object Constraint Language (OCL) is a declarative textual language for describing rules

that apply to Unified Modelling Language (UML) models and is now part of the UML standard

[28, p. 5]. The intention of OCL is to express additional constraints and semantics, which are

not in the graphical model (e.g. balance from buyer must greater than item price).

● Policy Description Language (PDL) from Bell Labs follows the event-condition-action rule

principle. It means that if a certain event occurs the policy is executed. After the event has

triggered the policy the condition is evaluated and if the condition is true the action is

performed [29]. PDL does not support grouping or roles.

● Ponder unifies concepts from different domain specific policy languages (security,

authorisation, information filtering, delegation, refrain, obligation). Ponder supports

hierarchical grouping of policies and meta-policies. Meta policies are policies about policies

within a certain scope. The meta-policies disallow to execute conflicting policies or limit the

permitted policies in a system. Roles are also supported. Roles allows to group policies to

reflect the structure of the organisation (manager, technician, …).[30]

● Ponder2 is the successor of Ponder. In comparison to Ponder Ponder2 is more orientated to

self-managing systems, whereas Ponder is more focused on system and general network

management. It is not only a standard, but it also provides an implementation including a

policy interpreter called PonderTalk (with SmallTalk syntax).[31]

● XACML (eXtensible Access Control Markup Language) is expressed in XML and an OASIS

standard[32]. As the name indicates it is designed for expression of access controls in a

vendor independent way. XACML is in this list because the Fedora repository framework uses

it. The result of a policy operation is access or deny.

● Rei is a policy language expressed in OWL-Lite. It can describe permissions as well as

obligations of entities inside a policy domain. The entity behaviour is driven by the evaluation

of the knowledge which consists of obligations and permission of the entity. It also has Meta-

policies that help in resolving conflicts.[33]

● CIM-SPL is an object-orientated policy language inspired by PDL, Ponder and ACPL. The

policies are expressed as condition action rules like PDL. It allows creating policy groups.

References to other classes can be created with the import statement.[34]

● Rule Interchange Format (RIF) has been developed by the W3C working group [35] and is

OWL and RDF compatible. There are several dialects, the basic dialect called RIF-Core (a

subset of RIF BLD and PRD with Horn rules and production rules) is expressed in XML. Other

28

See https://www.oasis-open.org/committees/document.php?document_id=24951. 29

See http://www.iso.org/iso/catalogue_detail?csnumber=36095.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 36 / 93

dialect types are RIF BLD (Horn logic) and RIF PRD (production rules). The main purpose of

RIF is to be an exchange format between different rule engine vendors, but RIF can be used

as rule language on its own. A few software implementations exists.Relevant LTA approaches

and projects

The term policy is often used in a wide range of senses in digital preservation. This varies from high

level documentation of organisational procedures for preservation to granular representations in

concrete policy languages.

The list below describes a number of guides and approaches: The “digital preservation policy

study”[36]is a guideline for digital preservation for educational institutions.

● The ERPANET project has developed a high-level tool for developing an organisation's

preservation policy. The policy should provide guidance and authorisation on the

preservation of digital materials and to ensure their authenticity, reliability and long-term

accessibility. Moreover, a policy should explain how digital preservation can serve major

needs of an institution and state some principles and rules on specific aspects which then

form the basis of implementation.[37]

● The Research Councils UK has defined the Common Principles on Data Policy.[38]

● Based on existing research data policies a ASERL/SURA working group defined a template for

research data management policies [39]. Templates could be instructive for later WP5 tasks

to build a policy editor or to provide guidelines.

● For research data management policies the DCC has collected a list of tools and guides.30

● The PLANETS project has worked on a conceptual model to guide the development of

institutional preservation policies.[40] However, it is more focused on a formal description

on the procedure to preserve a digital object with the help of guidelines. The guidelines are

(preservation) policies, but they are not machine executable. They form a kind of checklist

about the information that need to be captured and are also the base for a data model.

● The SCAPE project distinguishes between three different preservation policy levels:

1. High level or guidance policies .

2. Preservation procedure policies .

3. Control policies.

The first level describes the general preservation goals of an institution. SCAPE has published

a catalogue of high level policies on their wiki[41]. Ten categories are suggested:

Authenticity, Bit Preservation, Functional Preservation, Digital Object, Metadata, Rights,

Standards, Access, Organisation and Audit and Certification.

The second level describes the approaches of a particular organisation to achieve the high

level goals. They are more detailed and can feed into workflows and processes.

The third level is again more granular. It is intended to be machine readable (and human

readable) and is called control policy. This level provides a structured vocabulary that is

defined by OWL ontology. It is targeted for the use on planning and watch of preservation

items. The class model is structured around a preservation case. A preservation case has an

30

See http://www.dcc.ac.uk/resources/policy-and-legal/policy-tools-and-guidance/policy-tools-and-guidance.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 37 / 93

associated organisation, a scope in the organisation, a designated user community, a set of

contents (grouping of digital objects), objectives like format, access and authenticity. SCAPE

refers to a vocabulary list from the TU Wien that defines measures31 (examples for measures

are open format, documentation, multiple copies etc.). It is also possible to provide a self-

defined list of measures. A policy statement is formed together with objectives (objective

examples: must, should, must not). For example: the scanned newspaper of archive X must

have (objective) a resolution > 300 dpi (measure). The format of the scanned newspaper

should be open.

The Control policies are expressed as RDF. There is a tool that takes a spreadsheet as input

and maps the information into an OWL model. The spreadsheet acts as policy editor. There is

no automated way to transform the policies from layer one to two or from two to three.

SCAPE has developed a Control Policy Model and provides examples for creating control

policies based on the levels one and two[42].

● There are some relevant SHAMAN deliverables32 which deal with preservation policies:

each preservation process is associated with a high level policy statement. A policy

statement contains several activities (e.g. each incoming word processing file needs to be

converted to PDF/A, a checksum must be calculated). This statement is manually mapped

into individual actions or workflows expressed as Rule Interchange Format (RIF). SHAMAN

calls this high level, but in fact this is the middle abstract tier of policies. The policies can then

be transferred into low level policies. SHAMAN takes iRODS rules as an example. There is no

automated way to derive the policies from one level to the next level.

SHAMAN also did research on policy layers and organisation. The policies are grouped

against the SHAMAN lifecycles creation, assembly, archival, adoption and reuse. Each policy

is derived into sub-policies and associated with a lifecycle phase. A sub-policy should be

atomic. As a result the (sub-)policies interact with each other over several lifecycles. This

allows the policies to be grouped in a hierarchical order and traced back to the high level

policy statement. It also produces meta policies, which are a set of policies about policies.

Meta policies help to manage policies, for example they can enable or disable a policy at a

certain date or schedule a policy review.

The project view is similar, but not exactly the same. SHAMAN makes the statement that the

preservation environment can be entirely described by policies and a change can alter the

preservation system. PERICLES approach is broader; the policies can refer to all parts of the

digital ecosystem. The preservation system may be one part of the digital ecosystem, but can

contain other systems as well.

● The book "Policy Technologies for Self-Managing Systems" by IBM [27] contains information

about policy technologies for self-managing systems which is not targeted for digital

preservation. But it contains information about general policy based topics (policy languages,

31

See http://ifs.tuwien.ac.at/dp/vocabulary/quality/measures. 32

These deliverables have mostly a restricted dissemination level but are available to the previous SHAMAN

partners internal. Especially relevant are SHAMAN D3.4 Automation of Preservation Management Policies and

SHAMAN D9.1 Migrating the SHAMAN Preservation Environment.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 38 / 93

transformations) and application scenarios like policy based configuration management,

policy based fault management, policy based security management.

● UKDA has a preservation policy[43] that covers all activities relating to preservation across

the archive and is based largely on OAIS principles. There is also a Collections Development

Policy, which balances the constraints of cost, scholarly and historical value, and user

accessibility alongside the requirements of levels of authenticity and legal admissibility.

Hence different ingest processes may be required for material with different levels of quality

and significance. Data collections are assigned an ingest activity level as outlined in the

Archive’s Acquisition Review Process document.

● The “DIGITAL PRESERVATION POLICIES STUDY, Part 1: Final Report October 2008” by Beagrie

et al [36] is focused on high level policies for the application at higher and further education

sectors. The document is structured as an overview checklist on what to look for if you in

order to implement general policies. The list links to other approaches for further

information.

● The EU project PoSecCo33 focuses on the configuration of a service landscape with security

requirements in mind. Thus policies are here rules for authorisation. PoSecCo compiles

configurations for a system landscape based on different information sources like security

requirement, IT policies and other facts.

The deliverable “D1.3 - CONCEPT AND ARCHITECTURE OF THE OVERALL SOLUTION” [44]

defines the overall architecture based on the idea of creating a policy chain from high level to

low level security configurations.

“D2.1 - A Framework for Business Level Policies”[45] deals with state of the art of (IT)

governance frameworks and change management. It develops a model to integrate security

requirements into governance frameworks.

“D2.4 - POLICY HARMONIZATION AND REASONING” [46] could be relevant later when policy

services are developed. It provides ideas on how to check policies for inconsistency and how

to harmonize or simplify policies (in this case security policies).

“D3.5 - MODELS TO REFINE THE IT POLICY AT SERVICE LEVEL” [47] could also be relevant. It

deals with policy refinement, which means in this case to transform a policy into lower levels.

And it has ideas on how to map security policies to a heterogeneous system landscape.

● ESA Policy Guidelines for Space [48] is a set of high level policies with nine categories. They

have the aim to build a guideline what needs to be done to preserve earth observation data.

Each policy has a priority and a realisation of these policies can be compared and measured

which is expressed as adherence level.

● iRODS is a community-driven, open source, data grid software solution. iRODS microservices

are small, well-defined procedures/functions that perform simple tasks. Micro-services are

developed and made available by system programmers and application programmers and

compiled into the iRODS server code. Users and administrators can chain these Micro-

services to implement more complex functions.[49]

● Fedora implements a policy framework based on XACML (eXtensible Access Control Markup

Language), an OASIS standard. This enables specification of fine-grained, machine-readable

33

See http://www.posecco.eu.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 39 / 93

policies that can be used to control access to Fedora web services, Fedora digital objects,

datastreams, disseminations etc.[50]

● The Kindura project developed a rules framework on top of DuraCloud to automate the

replication of content across hybrid cloud storage. This used the Drools rules engine.34 The

policies are used as a decision knowledge base to suggest different storage options and

providers. To enable human editing the rules were extracted from an Excel spreadsheet[51].

Conclusion of state of the art 4.3.2.7.

Different approaches (SCAPE, SHAMAN, IBM Book) have shown that quite often the policies are

categorised into the layers high, abstract and executable. In the literature there seems to be no

automated way to derive policies from one layer to another and keep the relation between the

layers.

There has been research on finding and categorising generic textual policies and compiling generic

policies for long-term preservation. The policies form a categorised catalogue. The SCAPE project has

developed such policies [41]. They could be reused later by parts of Task T5.2.1 (identify a set of

long-term data management policies).

For task T5.2.1 (policy registries) and T5.3.1 (preservation process infrastructure) the ideas from the

PoSecCo project might be interesting. It has a different application area, but some concepts might be

worthwhile to investigate further such as the policy registry, derivation of fine granular policies from

common policies, consistency check).

Process 4.3.3.

A process describes a (business) workflow and contains all the necessary steps to perform a business

activity. The result of a process should produce value for an organisation. Processes can be manual

(sometimes with a formal description) or automated; that is the process is executed by a technical

service or a combination of both.

Policies may provide the rules from which from which the process chain can be derived or policies

may be applied during process execution.

In the terms of preservation a process can be either: (i) a process used to preserve a digital object (a

preservation process), (ii) a supporting process that provides a helper functionality, but does not

operate directly with digital objects, or (iii) a process that itself is described by a digital object that

must be preserved, that is, the series of steps (and associated information) are to be stored so that

they can later be understood or enacted (a process object).

Definition 4.3.3.1.

A process is a description of linked steps on how to transform an input to a certain output. A process

can invoke other systems or need human interaction.

34

See DROOLS. Business Logic integration Platform, https://www.jboss.org/drools/.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 40 / 93

Workflow and (business) process 4.3.3.2.

It is hard to draw a line between workflow and business process. The respective dictionary

definitions are:

Process

“a series of actions that you take in order to achieve a result[...]” [52]

“1. A series of actions or steps taken in order to achieve a particular end[...]” [53]

“1.2 A systematic series of mechanized or chemical operations that are performed in order to produce

something[...]” [53]

Workflow

“the way that a particular type of work is organised, or the order of the stages in a particular work

process [...]” [54]

“The sequence of industrial, administrative, or other processes through which a piece of work passes

from initiation to completion.” [55]

These definitions define a process as a generic way or procedure to do something. A process can

exist in any domain, for example it can be a biological, chemical or technical process.

In contrast the definition of workflow is more specific to a field of application. A workflow is

executed by an organisation and can describe how work is organised or which steps need to be

executed to achieve a certain result. The steps can include processes as part of the workflow.

The term “business process” is a more specific term for process that indicates that a process is

performed by an organisation/company to produce value for the organisation. It sounds similar than

the definition of a workflow. There is a controversy on how business process and workflow are

defined for an IT landscape. There is a distinguishing of workflow management systems and business

process management systems.

Workflow Management System (WfMS) vs Business Process Management System (BPMS)

To perform workflows or (business) processes with a computer system a Workflow Management

System or Business Process Management System is used. There is also a controversy about the

functionality and responsibility of a WfMS and BPMS system.

A Workflow Management System helps to perform a business workflow. It can contain automated

flows and manual user interactions as well. Traditionally a WfMS is considered to more document

flow centric and is a digital representation of records flow or administrative tasks. But it can also be

used to execute processes without user interactions. The WfMS acts as a flow control service.

A Business Process Management System includes the functionality of a WfMS and provides several

additional technologies, for example process optimisation, reporting, dashboards and process

monitoring [56]. There are also several classifications for different types of BPM systems [57].

Summary: a BPM system combines several technologies that help to manage an organisation by

providing workflows, monitoring, prediction and other tools. A workflow system is focused on

defining and executing workflows to perform business processes without the holistic approach of a

BPM system.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 41 / 93

Lifecycle 4.3.3.3.

Two main factors have an influence on the process lifecycle. A process makes use of several entities,

so a change in one of these entities can require a lifecycle change in the process entity, too. Other

influences are modifications on the digital ecosystem that need new or changed processes or change

the identity of a process. For an overview over lifecycles see Annex B.

Dependencies 4.3.3.4.

A process has a lot of dependencies. It involves many entities, it processes or produces digital

objects, and it needs technical service to perform operations. Policies can structure processes or can

be invoked as knowledge base during runtime. A process can also be dependent on other processes.

For an overview see Annex B.

Changes 4.3.3.5.

There are internal and external factors that indicate a change. Examples for external change are:

change in requirements, e.g. due to a shift in the designated community; technical services become

obsolete; economic aspects, performing tasks of the process becomes cheaper or easier to do.

Internal changes occur if something is changed on dependent entities that are necessary for process

execution. In this case the process needs to be modified to be compatible with the changed entities.

See also Annex B for an overview.

Properties of a Process 4.3.3.6.

Name Description Possible values Example(s)

Name Name to identify the process unique free text Convert RAW to TIFF

Version The version number allows assigning each entity to a certain state of the graph. This is necessary for tracking the graph history.

Version numbering scheme

0.1

Link to BPMN model

Contains a link to a BPMN model that describes the process flow and the involved entities more in detail.

Reference to the related BPMN model for the actual process.

-

Change parameters (How will an entity itself change?)

Expressed as ordered pairs of (change types, volatility). This measures how an object is likely to change and how likely this is to occur. For example users are subject to knowledge change, which can be classified according to a certain scheme (e.g. a value between 0 and 1).

Types semantic and terminology, practice, organisational, policy, requirements,

Work flow reorganisation (organisational) : 0.9

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 42 / 93

technology, other entities and dependency

Sensitivity parameters (To which change of a dependency is an entity vulnerable?)

Expressed as ordered pairs of (change type, sensitivity expressed as a value between 0 and 1). This measures to what changes (i.e. through dependency on another entity) an object is sensitive and the likelihood of this change occurring. Change types are semantic and terminology, practice, organisational, policy, requirements, technology, other entities and dependency

Semantic and terminology practice, organisational, policy, requirements, technology

policy change: laws governing retention periods = 0.5

Table 7: Preliminary properties of the entity Process

State of the art 4.3.3.7.

The chapter common guidelines and principles differentiate between a scientific and a business

workflow. There is a table that contains on overview over different workflow languages, open source

workflow and business process management systems and specific tools for scientific workflows. For

processes there are different graphic notations, they are briefly presented on the section graphical or

formal notation. The last chapter shows how the projects TIMBUS and SCAPE use processes and

workflows for LTA.

Common guidelines and principles

Workflow languages are typically either designed for interpretation by a computer (as a type of high-

level program)35, or designed to describe an existing process that can involve both automated and

manual tasks. We believe that both of these could be of interest to PERICLES and therefore consider

examples of both in the following paragraphs.

Workflow languages are closely coupled with the workflow system being used. A workflow system

typically contains a mechanism to compose a workflow, often a means to map the workflow to an

execution environment, and a workflow engine to manage the execution of the workflow. In some

cases (particularly those in which the mapping process is dynamic) the latter two stages may be

closely linked or even combined.

Workflow languages typically describe data processing elements connected by data flows, or

sometimes by connections that represent flow of control. It is fairly common for a language to have

the ability to express both of these paradigms, but it is usually the case that the language fits more

naturally with either one or the other.[58]

Scientific and business workflow[59]

35

Writing workflows is sometimes referred to as programming in the large.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 43 / 93

The main purpose of a scientific workflow system is to provide broad support for managing data

flows. The data can often be complex and high volume. To process this kind of data the data is

distributed to a grid or cluster for faster calculations. A scientific workflow system offers extensive

support for distributing and merging data from the grid/cluster and for transferring the data to

different systems in a specific order. It is a form of batch processing, but in parallel. Limited or no

human interaction during process runtime is involved.

Surveys of scientific workflow languages are provided in Deelman et al (2009) [58] and Taylor et al

(2007) [60] provide a concise and structured overview. The latter makes a useful means to

characterise workflow systems in terms of composition, mapping, execution and monitoring.

In contrast, workflow systems for business processes often involve manual human tasks during

execution. There is a strong support for integrating different systems into the workflow process

chain. The workflow system is responsible for the whole process execution and offers transactions

and extended monitoring. It is common to have a model of the workflow and runnable instances of

the model with a reference to the workflow revision. The traceability and fault handling of business

data is very important. There is a detailed fault handling mechanism to cover almost all potential

faults and there are different ways to react to a fault. The process duration varies between

milliseconds and days, or even months. Providing and monitoring quality of service requirements is

necessary.

The following table gives an overview over different workflow languages, open source workflow /

BPM tools and scientific workflow tools.

Name Visual Graph

Textual Flow Processing Engine

Domain Focus Notes

Workflow languages

Amazon Simple Workflow

Service36

No Yes Yes Distributed workflows in the cloud

Use a web interface to register components and “join” them together.

(WS-)

BPEL37

No standard representation. Some vendors have invented their own notations.

XML No, but several implementations

exist38

.

Execute Business processes via web services

Officially WS-BPEL. “WS-BPEL aims to model the behaviour of both executable and abstract processes.” WS-BPEL.

BPMN is often used as graphic representation.

BPMN39 Yes XML

serialisation is defined

No, different approac

Modelling of business processes

BPMN is quite common for graphical process modelling. There are mappings from BPMN to BPEL or XPDL that produce an executable process.

36

http://aws.amazon.com/swf/. 37

OASIS Web Services Business Process Execution Language (WSBPEL) TC Online at https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wsbpel. 38

See https://en.wikipedia.org/wiki/List_of_BPEL_engines. 39

See http://www.bpmn.org.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 44 / 93

hes exist to map BPMN to BPEL. (limited)

DISPEL40 No C-like syntax Yes Data intensive

applications It is workflow based approach to manage data flow through different systems.

ebXML Business

Process41

(ebBP)

No XML No Model business process in XML for exchange

It could be used for graphical tools. It is not an XML that can be used to perform the process.

WS-CDL42 No XML No. Web services For web service choreography. High level.

Designed to be descriptive rather than executable.

XPDL43 No XML No Exchange format

for BPMN models

Is a XML representation of BPMN models for exchange with different vendor products.

YAWL44 Yes XML Yes Provides support for workflow patterns.

Has formal semantics. Supports dynamic

workflows45

.]

open source workflow and BPM software

Activiti46 Yes Yes Yes business

workflow General purpose BPM with BPMN 2.0 support.

BonitaSoft47

Yes Unknown Yes business workflow

Community edition has several limitations.

Camunda

BPM48

Yes XML Yes BPM workflow General purpose BPM with BPMN 2.0 support.(forked from Activiti.)

Intalio community

edition49

Yes Unknown Yes BPM workflow Community edition has several limitations, only 80% open source.

jBPM50 Yes Yes Yes BPM workflow Modelling in Eclipse, BPMN 2.0,

Dashboard is web based

ProcessMa

ker51

Yes Yes Yes BPM workflow Community edition has several limitations.

40

See ADMIRE Project, http://www.admire-project.eu/docs/DISPEL-manual.pdf. 41

‚See http://docs.oasis-open.org/ebxml-bp/2.0.4/ebxmlbp-v2.0.4-Spec-os-en.html/ebxmlbp-v2.0.4-Spec-os-en.htm. 42

See http://www.w3.org/TR/ws-cdl-10/. 43

See http://www.xpdl.org/standards/xpdl-2.2/XPDL%202.2%20%282012-08-30%29.pdf. 44

See http://yawlfoundation.org/. 45

See http://yawlfoundation.org/pages/support/faq.html. 46

See http://www.activiti.org. 47

See http://community.bonitasoft.com. 48

See http://camunda.org. 49

See http://www.intalio.com/. 50

See http://www.jboss.org/jbpm. 51

See http://www.processmaker.com.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 45 / 93

Workflow tools for scientific applications

Apache

Airavata52

Yes Yes Yes Data intensive applications

Managing data on distributed systems.

Kepler53 Yes Yes Yes Data intensive

applications Managing data on distributed systems.

KNIME54 Yes Yes Yes Data intensive

applications Managing data on distributed systems.

Meandre55 Yes Yes Yes Data intensive

applications Managing data on distributed systems.

Pegasus56 No Yes Yes Data intensive

applications Managing data on distributed systems.

Taverna57 Yes Yes Yes Data intensive

applications Managing data on distributed systems.

Triana58 Yes Yes Yes Data intensive

applications Managing data on distributed systems.

VistRails59 Yes Yes Yes Data intensive

applications Managing data on distributed systems.

Table 8: summary of workflow languages, open source BPM and workflow tools and scientific workflow tools

A few examples of commercial workflow and BPM software

● IBM BPM60

● Oracle Business Process Management Suite (BPM)61

● SAP BPM62

● Software AG web Methods BPM63

● W4 BPMN64+

Other flow based languages, tools and scripting

This kind of software is not workflow or BPM software. It allows scripting or modelling of a (data)

flow, but it is either intended for a specific application area or assembles just a particular technical

process without any workflow options. Examples include:

● Apache Pig65

52

See http://airavata.apache.org. 53

See http://kepler-project.org. 54

See http://www.knime.org. 55

See http://www.seasr.org/meandre/ and ZigZag http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.160.1060&rep=rep1&type=pdf. 56

See http://pegasus.isi.edu. 57

See http://www.taverna.org.uk. 58

See http://www.trianacode.org. 59

See http://www.vistrails.org. 60

See http://www-03.ibm.com/software/products/en/category/bpm-software. 61

See http://www.oracle.com/de/technologies/bpm/overview/index.html. 62

See http://www.sap.com/pc/tech/business-process-management.html. 63

See https://www.softwareag.com/de/products/wm/bpm. 64

See http://en.w4software.com/product/bpmn-plus.htm.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 46 / 93

● Swift/T66

● Pentaho Data Integration (Kettle)67

Graphical or formal notation

There are existing methods and notations for process modelling. Some are very generic like

flowcharts, petri nets, UML state and activity diagrams. Other notations (EPC, BPMN, IDEF0, YAWL)

are specially designed for modelling of processes and have a more strict semantic notation. A brief

description of the more popular of these methods is described below.

● Flowchart: This is a simple diagram type that allows a decision flow to be modelled [61]. It

can be used to model processes as well because a process consists of linked steps together

with decisions which part of the flow chart should be followed next depending on the

decision result.

● Petri nets are a mathematical description to model distributed systems. They can be used for

a broad application field: simulation, logistics, biology, business process modelling etc. A

petri net contains four base elements: place, transition, arc and token. Places represents

conditions, input data, things, storage; transitions can modify places; arc is an edge

(connection) to the elements and token resides inside places and can flow along the petri net

based on the transitions. With these elements forks and joins including synchronization

conditions can be modelled.[62]

● UML (Unified Modelling Language): UML contains several diagram types: class, component,

composite, deployment, object, package, profile, activity, communication, interaction

overview, sequence, state, timing, use case diagrams. The state diagram [63] is similar to

Petri nets so it can also be used to model process.

The activity diagram [64], which is a representation of a flow of control, can also be used for

modelling processes. It is similar to a flowchart. For high level modelling a use case diagram

[65] is partly usable.

● EPC (Event driven process chain) was developed for ARIS (Architecture of Integrated

Information Systems) and is a type of flowchart designed for business process modelling. It is

a simple modelling that basically consists of passive events (start/stop, others), functions

(process steps, activities), logical connectors (branches, fork, AND, OR, XOR), involvement of

persons (organization unit) and information flow.[66]

● BPMN (Business process Model and Notation) developed by Business Process Management

Initiative is widely used to model business process and offers more functionality than the

modelling types described above. BPMN contains a start event, this is the process trigger

type; activities which are the process steps; gateways to fork and merge branches. These

object types can be connected by: a sequence flow which connects steps in sequence;

message flow that shows messages between participating layers; associations that can be

used to link non-flow objects to provide additional documentation. Swimlanes called pool

and lane can be used to group parts of the process flow. Transactions are supported as well

65

See https://pig.apache.org. 66

See http://www.mcs.anl.gov/exm/local/guides/swift.html. 67

See http://community.pentaho.com/projects/data-integration/.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 47 / 93

as alternative flow path like exception handling.68 BPMN 2.0 provides besides more objects

and better manual human task integration a XML serialisation of the model and a

formalization of the implicit meta-model of BPMN.

Many tools for BPMN modelling are available.[67] BPMN models are often used to derive

machine executable processes.

● IDEF0 (ICAM Definition language) where ICAM stands for Integrated Computer-Aided

Manufacturing) has its origins at the U.S. Air Force and is designed for modelling decision and

activities from systems. In contrast to the other approaches IDEF0 is more top down

oriented. Each activity has an associated context in modelling and for more than a few steps

creating sub-diagrams and hierarchical diagrams is mandatory. It supports forks and

joins.[68]

● YAWL (Yet another workflow language) is based on petri nets. The language tries to

minimize the components needed for modelling and provides comprehensive support for

workflow patterns. The benefit of YAWL is that it has a formal semantic that can be

validated, which is a feature from the underlying petri nets. There is also a reference

implementation of a YAWL engine with the same name like the language YAWL.[69]

Relevant LTA approaches and projects

TIMBUS

TIMBUS has the aim to preserve business processes and services. There are three phases defined to

achieve this [70]

Phase 1 planning - capture process and its context

In the first phase the business process, context and stakeholders are captured. BPMN notation and

ArchiMate are used as representations. After that a risk management analysis is performed. The goal

is to identify potential risks and to choose a preservation method.

After the exploratory work is completed, the preservation approaches are chosen. First the

significant properties from the process are determined. This step includes decisions about the

redeployment scenarios (e.g. execute original process with original data, execute original process

with new data or execute modified process with original data). The goal is to maintain the logic of

the process, which means that some implementation details can be abstracted or replaced. Because

a process can be quite complex, different preservation methods may be combined (migration,

emulation and virtualisation). Depending on the requirements a certain strategy is evaluated. The

strategy could include mock up services if the process is dependent on external services that are not

under the control of the process.

Phase 2 preservation phase

In this phase the software and data of the process are captured and the selected preservation

methods are applied to the data. In addition the behaviour and performance of a process is

captured. This data will be used in the redeployment phase to compare the new behaviour of the

68

See http://www.omg.org/cgi-bin/doc?dtc/10-06-02.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 48 / 93

process against the original process.

Phase 3 redeployment phase

At some point in the future the process may be reactivated. Before the reactivation happens a gap

analysis between the requirements of the preserved process and the possibilities of the new

environment is done. After the gap analysis the new environment is prepared for the redeployment

of the preserved process. To close the identified gaps, different preservation approaches are possible

(emulation, format migration). The output of the new deployed process is compared to the saved

behaviour of the original process.

Relation to PERICLES

Digital ecosystem modelling aims to capture dependencies between the entities. To map user

activities to instances of the entities, e.g. Technical Service, it is necessary to understand the system

landscape. The same applies within TIMBUS but the model of the system is needed to correctly

capture the context of a business process. ArchiMate and other modelling approaches are used for

this. This is the same for PERICLES. But after this step the approaches differ. TIMBUS then captures

the business process together with data for redeployment comparison.

In PERICLES the dependency models are created after the system landscape modelling to allow

statements to be made about what happens if something on the entities or dependencies is about to

change. Also PERICLES deals with processes that are less structured and more ad hoc like. The

environment of an unstructured workflow is fuzzier.

SCAPE

SCAPE uses the Taverna workflow engine on different levels to execute workflows that perform

preservation actions. The workflows are a bridge between the (preservation) system and the storage.

They are an abstraction of the underlying storage and they can work with different storage types and

provider. Taverna workflows are listed in a catalogue for re-using common workflows. It is also part

of the PLATO preservation planning tool [71].

Conclusion of state of the art

Different kinds of graphical notations for modelling processes/workflows exist. The BPMN

representation is widely used. There is a textual XML representation of the graphical BPMN

elements, so BPMN can also be used as exchange format among different modelling/BPM tools. A

broad set of different tools exists with BPMN support. It is possible to map BPMN to XPDL or BPEL as

executable formats.

YAWL is an alternative to BPMN and might be worth investigating within PERICLES. It consists of a

modelling language and an execution environment. YAWL is more workflow pattern focused and

there is a direct mapping to an executable form. In contrast, the mappings from BPMN to XPDL or

BPEL do not cover all aspects of the BPMN model.

Other graphical notations are only for process modelling and there is no transition to an executable

process, with the exception of enterprise process chain (EPC) which is mainly used by SAP.

Many different workflow and BPM systems are on the market, including commercial and open

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 49 / 93

source systems. One must distinguish between systems for business workflows and systems for

scientific workflows. Scientific workflows deal with automatic data flow control on grids or clusters

with limited or no user interaction. Business workflow systems on the other hand may have manual

user interactions and are focused on integrating different systems into the process chain and

reliable, traceable workflow processing.

The choice of a BPM or workflow system depends on the requirements of the analysed process flow:

if the process steps are exposed as web services on the other system then a workflow system

designed for web services should be chosen.

If the process chain is a scientific workflow a workflow system like Kepler or Taverna should be

chosen. A scientific workflow system may be combined with a business workflow system. If data

streaming is a key component of the workflow (audio, video) then DISPEL may be worth

investigating.

Technical Service 4.3.4.

This is a very generic entity type. A technical service can be any hardware and software that is either

involved in the preservation process or in the interaction with digital objects.

Definition 4.3.4.1.

A technical service consists of hardware and software. The software typically governs the behaviour

of a technical service. We use “service” to mean any operation that a technical service offers to the

outside. It can be a user interface, a service that provides value to an organisation or a Technical

Service used for automated machine to machine communication.

Lifecycle 4.3.4.2.

Currently a typical lifecycle transition of a technical service (TS) lasts roughly five years. It is to be

expected that every lifecycle the hardware will be upgraded to the latest technology. The same

applies to software. Either a system is replaced or updated to support changes of the entities or new

technologies and features. The new features can be driven by requirements of the user. Software

updates happen normally more frequently than every five years. See also Annex B.

Dependencies 4.3.4.3.

A technical service act as a gateway to all entities, so there are lot of dependencies to be expected.

Also a technical service is often dependent on other Technical Services. Policies could define an

outline of an overall system landscape structure, restrict technical services or act as a rulebase. For a

comparison see Annex B.

Changes 4.3.4.4.

TS include a broad area of components like hard- and software plus interfaces to the outside. Thus

there can be many changes. There are internal changes, for example an entity is changed in such way

that the TS cannot use it any more (format change of DO, modification of the system architecture

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 50 / 93

plan). Most of the changes to a TS are external, for example the user requirements change, system or

hardware is obsolete, there are better ways to do something. .Annex B gives an overview about

different causes.

Properties of a Technical Service 4.3.4.5.

Name Description Example(s)

Name Name to identify the technical service Flatbed image Scan station #1

Type of resource Is it software, hardware, storage, compute? Hardware

Change parameters

Expressed as ordered pairs of (change type, volatility). This measures how an object is likely to change and how likely this is to occur. For example users are subject to knowledge change, which can be classified according to a certain scheme (e.g. a value between 0 and 1).

(Semantic knowledge: 0.5), (Change of role: 0.1)

Sensitivity parameters

Expressed as ordered pairs of (change type, sensitivity expressed as a value between 0 and 1). This measures to what changes (i.e. through dependency on another entity) an object is sensitive and the likelihood of this change occurring.

Technological changes: 0.2 Semantic changes: 0.2 Structural changes: 0.5

Table 9: Preliminary properties of the entity Technical Service

State of the art 4.3.4.6.

Common guidelines and principles

Technical service

A technical service is any kind of external interface. There are three main types of interfaces: user

interfaces, server systems and middleware.

User interface: this is any interface a user interacts with. This is a broad field. A user interface can

have many forms. It can be a hardware display with only some bulbs and scales whereas the visual

display has a certain meaning or the visual display can be a more advanced type of computer screen

application. For screen applications there are two classes, local applications and remote applications.

In most cases the application logic is embedded into local applications, whereas on remote

application the main application logic is embedded on the server. If the application logic is located

somewhere else then the application is called client which normally provides the user interface. It is

still possible that the client contains application logic, but it should not a lot.

Server system: a system that expose interfaces to the outside. One directly accessible form is a web

interface. Also a server system can provide many different types of interfaces that are intended for

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 51 / 93

machine to machine communication. One open specified form is web services which expose the

interfaces via HTTP. In addition there are many other protocol types known under the term remote

procedure call (RPC), most of them are proprietary.

Middleware: if many (server) systems with different interfaces needed to be connected with each

other some logic needs to be executed, then middleware software is used. A middleware is the glue

between several systems and can expose new interfaces as well.

Graphical or formal notation

There are three categories for modelling a Technical Service. One is the modelling of the whole IT

landscape which includes all Technical Services, the connection of the services, business activities,

business processes, users and often non-technical processes as well. This is known as enterprise

architecture modelling.

Another form is enterprise architecture frameworks. Besides modelling these frameworks provide

strategies, procedures and processes on how to organize an organisation and how to solve business

problems. They are often proprietary.

Depending on the chosen enterprise architecture (modelling) approach the modelling might be

limited and might be too high level to model a TS. Also the enterprise architecture modelling may

cover a lot of other things that are not needed to model a TS.

On the other hand popular modelling languages in software engineering allow not only to model an

internal structure of software, but also to model a system and its context. These modelling languages

may be limited in the possibilities to model the interaction TS and other processes in comparison to

enterprise modelling.

The following list contains one or two popular modelling languages for each category.

Enterprise architecture framework

TOGAF stands for The Open Group Architecture Framework. It defines four main architecture

categories Business Architecture, Application Architecture, Data Architecture and Technology

Architecture. Business Architecture deals with strategies, processes and activities of the organisation.

Application Architecture covers all (software) applications that are necessary to perform the

functionality required by the Business Architecture. The Data Architecture covers all information that

is necessary to execute the business Architecture. The last category Technology Architecture is for

modelling the technical IT infrastructure [72]. The modelling of a TS needs involves the categories

application and technology.

Enterprise architecture modelling

ArchiMate is an open standard from The Open Group for enterprise modelling. It contains a Business

layer, Application layer, Technology layer, Implementation & Migration. It looks similar than TOGAF

but the meta-models are not entirely compatible. The main difference is that TOGAF provides a

process and method but no notation69, whereas ArchiMate provides a notation, but not method.

69

See There are non-official third party attempts to create graphical notations for TOGAF, see http://www.togaf-modeling.org/.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 52 / 93

Since ArchiMate 2.0 the integration between TOGAF and ArchiMate has been improved. TOGAF and

ArchiMate can be used together [73].

General modelling languages for software engineering

UML provides several diagram types to model a TS. A component diagram70 allows modelling of

loosly coupled systems. An information flow diagram71 or a communication diagram72 provides an

overview over the messages and functions that are exchanged between the systems or software.

With a package diagram73 a detailed overview over the software architecture can be given. The

deployment diagram74 contains the necessary services and dependencies that are needed to run

software on a technical service.

SysML is a derivation of the UML modelling standard to extend it for system modelling. In addition to

the UML diagrams described above a block definition diagram can be used for system modelling. A

block definition can have different forms. As a system context diagram it can provide a high level

overview to depict the structure of the involved systems. A black box view of the system can be

modelled with a regular block definition diagram. An internal white box view of the system can be

expressed with an internal block level diagram.

Relevant LTA approaches and projects

1. There are currently three main strategies for LTA of Technical Services: Create a own archive

or solution for long term archiving

In this case the institution creates an own archive and solution for their artefacts.

2. Distributed file copying

With distributed file copying it is ensured that the bitstream data is saved in different

locations. Often saving data on location places is done by cooperation of different

institutions. This ensures that the raw data is available and the copies are valid, but does not

solve any other LTA problems.

Example approaches for distributed file copying are LOCKSS and EUDAT. LOCKSS (Lots Of

Copies Keep Stuff Safe) is an open source server solution for libraries. It was mainly intended

for electronic journals, but has been extended to other media types. In addition to the

software there is a global LOCKSS network for exchanging data, but it is also possible to

create private LOCKSS networks. Each institution can chose which data should be

mirrored[74].

EUDAT (European Data Infrastructure) is a network for exchanging research data. In addition

to the file replication, which is realised with iRODS, it offers additional services for finding,

sharing and computing of research data[75].

70

See http://www.uml-diagrams.org/component-diagrams.html. 71

See http://www.uml-diagrams.org/information-flow-diagrams.html. 72

See http://www.uml-diagrams.org/communication-diagrams.html. 73

See http://www.uml-diagrams.org/package-diagrams-overview.html. 74

See http://www.uml-diagrams.org/deployment-diagrams-overview.html#deployment-manifestation-artifacts.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 53 / 93

3. Micro services that are replaceable.

This approach avoids creating a monolithic (preservation) system. The required functionality

is split into small function blocks that have a defined interface. It is now possible to reuse and

link these blocks to create a greater functionality. Also function blocks can be easily replaced

as long as the interface stays the same.

Conclusion of the state of the art

For modelling Technical Services there are several options. The choice depends on how fine granular

the model should be and if the surrounding systems or other things should be modelled, too. The

recommendation for depicting the functionality and involved systems are SysML block diagrams.

ArchiMate is recommended for a more holistic picture of the system landscape.

A lot of effort has been made in the past to preserve and migration of files. The current way is to use

micro services that are often used in the combination of distributed file systems.

User community 4.3.5.

The term “User” is widely used, and its meaning depends on the context in which it is used. We limit this entity to people who are relevant for evaluating the usefulness of digital objects, e.g. because they want to make use of the digital object or they influence how a digital object should be used. Examples for the latter could be artists which might have clear requirements for how their creation should look like. We exclude people whose role it is to run or manage the involved infrastructure or systems and are only “users” in this limited sense.

Definition 4.3.5.1.

OAIS defines the term "Designated Communities" to denote a group of users: "An identified group of

potential Consumers who should be able to understand a particular set of information. The

Designated Community may be composed of multiple user communities. A Designated Community is

defined by the Archive and this definition may change over time." [4, pp. 1–11]

In PERICLES the users are defined as the stakeholders interested in the future usage of the digital

objects. This does not have to be limited to the persons actually using the digital objects but can also

include the creators of the digital object if they e.g. have authority about the right interpretation or

usage of the object. The expectations of the users in this sense have to be monitored and PERICLES

does not assume that the institution can choose their users. The users can be grouped into user

communities based on the activities they want to perform.

Lifecycle 4.3.5.2.

The term lifecycles can apply to different types of entity, for example digital object, but normally correspond to the lifecycle of digital objects. In this case a user represents a model of a user community. The main task for lifecycle management of users is to watch for any community changes and propose a strategy to adopt the changes. The other typical, but less frequent events are a retirement of a user community and the creation of a user community.

It is also quite likely that there will be people who make use of the digital objects who are not necessarily part of a designated community and for which an institution did not actively intended to

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 54 / 93

serve and did not design their systems for them. For the purposes of this document, we do not attempt to cater for these people, other than to take into account that they may become a designated community if the institution decides to make them one. The OAIS defines an activity "Monitor Designated Community" which can trigger reactions to changes in the designated community as necessary for digital preservation.

The lifecycle of a user community (as defined here) can be considered to be quite clear-cut in that communities become “designated” at a precise time as decided by the repository. The interesting aspect is that in the wider world, communities form, join, split, and dissolve over extended periods of time and there may not be a clear trigger for the repository to consider a change to its designated communities.

A change in the properties of an individual user only becomes relevant when considering access. At this point, you may need to consider whether or not an individual is a member of a community and how to deal with individuals joining and leaving communities. This also depends on the policies regarding access to the repository, and whether access rights are dependent on community membership, or whether they are governed by something else (such as being a member of a funding institution).

Dependencies 4.3.5.3.

A user is part of the digital ecosystem and in common with other entities the user has a state and dependencies. The user can be passive and just consume the information he can be active. Active means that the user plays the part of a process as an actor (e.g. submitting new digital objects and adding context information).

A user needs to utilise a technical service to get access to the entities. A user community has implicit or explicit requirements about what they want to see or what a system should do. The requirements or a change in them may cause a change in other entities e.g. policies might be triggered. On the other hand policies may also constrain what user community can do. For an overview of dependencies see Annex B.

Changes 4.3.5.4.

It can happen that the community knowledge and terminology changes over time. At a certain point it may be necessary to modify the (eco-) system or more specifically the entities and their relationships to fulfil the user requirements. Semantic and user evolution will be the focus of later PERICLES deliverables which investigate how the model can be extended to enable knowledge and terminology change.

A change may also come from modifications of the (eco-) system itself. For example a process or technical service needs to replaced or retired due to obsolescence. This could affect the user community (e.g. new graphical interface affects the interaction with a technical system). For a list of potential changes see Annex B.

Properties of a User / User community 4.3.5.5.

The exact modelling and properties of a user entity in the digital ecosystem model will be part of a more detailed investigation later in the PERICLES project. The currently described properties are preliminary.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 55 / 93

Name Description Example(s)

Name Name of the user community SOLAR scientist

Identifier Identifier that assigns a user to a designated user community. The identifier refers to a detailed description of the user community. The description could contain information about the terminology and knowledge and their main business activities they.

solar-scientist

Version The version number allows assigning each entity to a certain state of the graph. This is necessary for tracking the graph history.

0.1

User role Chosen from a defined set of user roles, see Annex B: Stakeholder roles

SOLAR Scientist, Curator

Change parameters

Ordered pairs of (change type, volatility). For example users are subject to knowledge change, which can be classified according to a certain weight (e.g. a value between 0 and 1).

(Semantic knowledge, 0.3), (Change of role, 0.4)

4.4. Change This subsection covers the aspects of change. Change can take a number of forms and this section

provides a categorisation of such change.

Definition 4.4.1.

The word change is very generic as the entry in a dictionary indicates: “an act or process through

which something becomes different”[76].

This general definition of change should be extended to include information about the

consequences. For PERICLES, only significant change needs to be considered. Significant means in our

context that something in the digital ecosystem needs to be considered to maintain the usefulness of

parts of the digital ecosystem, usually certain digital objects: “Change is an act or process through

which something becomes different. The difference is significant if entities need to be adapted to

maintain the usefulness of parts of the digital ecosystem.”

Overview of types of change 4.4.2.

The analyses of the different entities regarding types of change have shown that there are two main

groups that could cause a change. One is semantic change. This can emerge if there are changes in

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 56 / 93

the knowledge or the habits of a user community. If it is a significant change then it will lead to one

or more of the structural change types. Structural changes are not necessarily dependent on

semantic changes. They can have different causes, for example hardware needs to be replaced, there

are new features that should be implemented, legal changes require a policy change or an entity

type will be processed by another process or technical service. Error! Reference source not found.

groups a change into two categories. Semantic change is one category, e.g. new terminology, the

practice of the user changes, the institution is reorganised. The second category is a structural

change. These are all other changes that require a modification on the digital ecosystem. Examples

are new requirements, policy is changed and a software system is updated. A significant semantic

change leads to structural changes. The result of each change is a change operation like create,

modify and delete.

Figure 5: Overview types of change

Type of change Description Example

Knowledge and terminology

Changes in semantics that originate from the designated user community. If they are significant it leads to structural changes.

Einstein and Newton had a significantly different view of time and space.

Climate scientists and solar scientists can use the same underlying datasets with different goals. Their terminology and background knowledge will be different.

Practice This change originates from new or changed habits of the designated user community (not necessary related to knowledge and terminology changes). It is an indicator that user requirements may change.

The user (community) starts to read everything on eBooks and they expect that all material is provided as an eBook.

Organisational An organisational change is an overall change that could affect the digital ecosystem. There are many different

The needs of a new user community with different knowledge have to be

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 57 / 93

Type of change Description Example

origins. It can be a political, financial or a strategic decision. The result can be any of the structural change types. Often an organisational change leads to a policy change.

addressed.

Two different repositories are merged because the institutions have merged.

Three different subjects are merged into one department. This is reflected on the digital ecosystem.

A manual approval process for new documents is introduced.

Policy The term policy contains changes in permissions, legal changes, quality assurance and strategy.

The records have a lock-up period of 10 years.

Requirement Under this term any changes in requirements is meant. There are different types:

Business: High-level goals of an organisation.

Functional: technical requirements that a system should fulfil.

Quality of service (QOS): characteristic requirements like response time, availability, extensibility.

User: these are requirements of a certain designated community. User requirements are not necessarily bound to semantic or habit change.

Business: provide free access to our publications over the internet.

Functional: the catalogue software should run on tablets.

QOS: there must be no data loss on the objects.

User requirements: A new user community is added. They want to see the objects in a special way. Several entities have to be modified.

Technology All changes that come from the technology side and that require a change. This is a broad category that includes hard- and software, interfaces and technical services.

A software vendor has gone out of business. The software systems need to be replaced in the near future.

Other entities Includes changes on the other three entity types Digital Object, User and Process

A Process and Digital Object is updated to include new metadata attributes; there is a new interface that an existing Process should use.

Dependency This category contains all changes of dependencies. Either characteristic attributes of a dependency are changed (e.g. quicker, faster, more flexible,

A scan service provider becomes more expensive, but the available budget does not increase. The number of scanned pages has to be

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 58 / 93

Type of change Description Example

cheaper) or the dependencies itself changed (e.g. a digital object will no more be processed by system a, in future, it will be processed by system b and c).

reduced or an in-house solution has to be considered.

Create, Modify, Delete

These are the operations that follow after a change. They need a detailed description what is modified (not just “modify” but something like add attribute a, delete a process step from process b...)

An update metadata creation process produces additional fields. The entity Digital object needs to be modified to support these new metadata fields. The entities database and application code needs to be altered as well because they depend on a certain structure of the Digital Object.

Table 10: Classification of change types

State of the art 4.4.3.

Change management

The term Change Management is usually used to mean either (i) the approach used to bring about a desired change, or (ii) the processes used to manage changes to the scope of a project.

In PERICLES, we are trying to respond to change, but we will also want to bring about change (to the

digital ecosystem in response to changes in the wider environment). Besides digital preservation

approaches and lifecycle models there are a few approaches which try to define in general how to

deal with change over time. For PERICLES these approaches are important because preservation

focuses too often only on static conservation. However, it is not useful and necessary in every

context or scenario that a digital object looks and behaves exactly like it did at a prior point in time.

● IT governance standards like COBIT75 and ITIL76 address change and process management as

part of the normal activities of maintaining an IT infrastructure. COBIT addresses this as part

of their domain "Acquire and Implement" and ITIL as part of "Service Support" or "Service

Transition". Unfortunately these standards are very abstract, expensive and are often

updated with non-backward compatible terminology.

In these IT standards the change originates always from a user that creates a new change

request. Then a standardised process takes place. There is no automatic change detection.

● SCAPE has produced a deliverable that describes a preservation watch which does not only

include file format obsolescence. It might be worth to elaborate how SCAPE monitors change

in the deliverable D12.1[77] What is missing in this deliverable is what the change triggers

75

See http://www.isaca.org/COBIT/. 76

See http://www.itil-officialsite.com.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 59 / 93

look like and how they get the data for classification. There is a description of a repository,

but no description of the rules.

● The SCAPE project includes an activity described as Preservation Watch. They aim to “provide automated mechanisms to support the monitoring and evolution of preservation plans over the lifecycle of digital content and react to a dynamically changing environment and user behaviour” [77]. This is an important aspect of the change management approach adopted by the project.

● They propose a Watch Service which comprises components that aims to keep track of external factors by storing a local representation of the wider world. These components use a variety of information sources to build and maintain this representation including “format registries and the SCAPE preservation component catalogue; policy models; repositories; experiment results; content profiles; human knowledge; … a snapshot service comparing web page renderings to baseline rendering snapshots, and a simulator for assessing the effects of planning and watch decisions” [77] also includes a list of triggers that causes action to be taken in response to certain changes. SCAPE focuses on the creation of automated mechanisms and accepts that some “manual changes” may not be fully managed by their Watch process. The local representation of all the information known to the system is referred to as a knowledge base and the changes of interest are specified by “Watch Requests” which specifies a trigger. A trigger is associated with one or more notifications, a specific condition, and possibly a preservation plan.

● The PLANETS project was a predecessor for SCAPE. Amongst other things, they developed

the PLATO tool for preservation planning. This tool is still being developed.77 Until recently

Plato offered little or no functionality to directly address change, but version 4.4 (released

March, 2014) enables the creation of monitoring conditions, including triggers for a

preservation watch system call Scout.78

● The National Archives of the UK promote the concept of Digital Continuity which is broader

than digital preservation. "Digital continuity is the ability to use digital information in the way

that you need, for as long as you need." [78] [79] [80]

Lifecycle management

Lifecycle models usually provide a framework for the sequential actions like creation, productive use,

modification and disposal for the management of the entity whose lifecycle they model. Typical are

data or specifically research data lifecycle models, but lifecycle models as such are not limited to

data. In the same way as data it is possible to model the lifecycle of service, policies or other entities.

Difficulties of lifecycle models are that they usually assume that the actions or phases included are

repeated (the "cycle") and they often do not explain well what this repetition means. They also

suggest a linear sequence of phases and activities which actually might be non-linear or even chaotic.

The applicability of lifecycles for modelling change in a preservation ecosystem might be limited

because they focus on the "life" of a single thing or entity only.

Good overviews of research data lifecycle models are provided by Ball "Review of Data Management

77

See http://www.ifs.tuwien.ac.at/dp/plato/ 78

See https://github.com/openplanets/scout

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 60 / 93

Lifecycle Models"[81] and the CEOS Working Group on Data Life Cycle Models and Concepts[82].

The DCC lifecycle model [83] is probably the most established lifecycle model for research data and is quite detailed. It describes seven sequential actions and four full lifecycle actions. The emphasis here is on the preservation and curation activities; “Access, Use and Reuse” are combined into a single activity. The Web Archiving Life Cycle Model includes policy as topic and integrates organisational and object related tasks. It is of course focused on web archiving and it (possibly wrongly) conflates organisational and object related tasks [84]. The WF4EVER project is working on workflow lifecycles and the preservation of workflows [85]. They describe a research object lifecycle focused around three main states of the object: live, published and archived. The UK Data Archive also describes a research data lifecycle [86]. This comprises six sequential activities and, unlike the DCC’s model, it is more focused around the data user’s perspective of what happens to the data. In the UKDA’s model, Preserving Data is only one of the 6 stages. The other 5 stages correspond broadly to the DCC’s “access, use and reuse”. The Data Documentation Initiative (DDI)79 is an effort to create an international standard for describing data from the social, behavioural, and economic sciences. Expressed in XML, the DDI metadata specification now supports the entire research data life cycle. DDI metadata accompanies and enables data conceptualization, collection, processing, distribution, discovery, analysis, repurposing, and archiving. Pepe et al (2010) [87] describe an implementation of the Open Archives Initiative’s Object Reuse and Exchange data model (OAI-ORE). Here they collect information about multiple stages in the research process, and use standard Semantic Web practices to identify and link to the objects that were used in the research including publications, data and contextual research information.

Records management lifecycle

The term record is defined in ISO 15489-1:2001 as “Information created, received, and maintained as evidence and information by an organization or person, in pursuance of legal obligations or in the transaction of business”[88]. So a record is not a generic document, but an artefact to document business activities of an organisation for legal reasons.

And record management is defined in ISO 15489-1:2001 as “Field of management responsible for the efficient and systematic control of the creation, receipt, maintenance, use and disposition of records, including processes for capturing and maintaining evidence of and information about business activities and transactions in the form of records.”[88]

The records management follows defined steps. There are different models and each country has individual laws for retention period of records, but there are some common steps involved in records management. First the records are created. A record can have different forms and must not necessary only be a document. In this stage records are active and are distributed and updated. This phase can last a few hours to a few years. Some records are stored for legal or administrative reasons and some get disposed after their use before they enter an archive. In the final stage some records are selected for archiving purposes for future use and research. Others are destructed after the retentions period is over [89].

There are many common requirements between long-term preservation and records management, but there are also some important differences in emphasis. Records have the role to proof work and decisions. This is not necessarily true for other long term archive material as it can have many origins and value for the use in future.

Also records management is typically undertaken over shorter timescales, which are also often fixed in advance (for instance, by law). The timespan of long term archives is not defined which means as

79

See Data Document Initiative, http://www.ddialliance.org/.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 61 / 93

long as possible.

Often after this time, records are deliberately destroyed, either for reasons of cost, or for compliance with data protection laws. Sometimes, instead of being destroyed, the records are subsequently archived or preserved with more standard long-term preservation practices.

For LTA it depends on which step the repository or archive is involved. Often the archive or repository is the last link in the chain and the valuable artefacts have already been selected and they will not get deleted.

4.5. Dependencies

Definition 4.5.1.

“Given objects A and B. A is dependent on B if changes to B have a significant impact on the

state of A, or if changes to B can impact the ability to perform function X on A.”

Types of dependencies 4.5.2.

PERICLES aims to model and analysis several distinct types of change within a digital ecosystem.

Therefore it is not sufficient to simply establish that an entity A depends on entity B, the dependency

needs to be clearly specified.

Error! Reference source not found. shows the assumed dependency types. The diagram should be

read as a decision map: It starts with the assertion that a dependency exists. First we determine if

the dependency is hard, soft or fuzzy. Then we determine the direction of the dependency. After

that, we add any other property that fits such as condition, ranking or prediction to specify the

dependency in more detail. The properties can be added multiple times.

Example:

Consider an external XML notification service about new items in an archive. The dependency of the

XML notification service on the archive is hard. This means that the link between notification service

must exist to perform the desired functionality. The dependency direction is unidirectional because

the technical system does not depend on the service. The service has an average availability of

99.8%; the type of this condition describes the quality of service. Also the response time of the

notification service should be less than 10 seconds. This condition has the same type and class. The

probability of change of this service is considered to be 20% within the next five years.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 62 / 93

Figure 6: Types of dependencies

Dependency Classification Description Values Example

Hard Binding type A dependency is hard if something does not work without the existence or presence of another entity.

hard A stored Digital Object cannot be accessed without a Technical Service.

Soft Binding type The existence of a dependency offers advanced functionality. A broken dependency results in a system that still works, but without a particular feature.

soft When a user views a document in a word processor and the selected font is not installed, the application selects a default font. The document is still readable and editable with the loss of the original look.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 63 / 93

Dependency Classification Description Values Example

fuzzy affinity Binding type A type of fuzziness is an expression of a degree that this type of dependency exists or belongs to an entity (membership). The degree is expressed as in percentage terms [90, pp. 68–79].

This type is also called a fuzzy set.

0..1

The number is a percentage value and expresses the degree of affinity.

An item costs 200 Euro and has the degree of 0.6 that it belongs to the class expensive and a degree of 0.4 that it belongs to the class cheap.

fuzzy range Binding type The fuzzy range defines a range within which the behaviour is acceptable. The transition is flexible and must not necessary be a constant linear function. [90, pp. 68–79]

The term fuzzy range conforms to what is known as fuzzy logic.

Good usability of the system, the term designated user community, fast response of a system (can be subjective).

The following items are not dependencies by their own. They can be added to the dependency types above to specify the dependency in more detail.

implicit Binding type This dependency describes assumptions and knowledge that are normally not written down. These dependencies are not easy to determine.

To be able to read this document you must know English and have the background knowledge to know the meaning of the technical vocabulary.

unidirectional Direction Entity A depends on entity B but B does not depend on A.

Unidirectional is default. The dependency link is

The function that displays metadata from digital object is unidirectional. There is no interaction from Digital Objects

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 64 / 93

Dependency Classification Description Values Example

modelled with one arrow.

to the function involved.

bidirectional Direction Entity A depends on entity B and B depends on A.

Dependency link is modelled with arrowheads in both directions.

For example a backup system is bidirectional. In normal operation the creation of the backup is unidirectional. As soon as it is restored it becomes bidirectional.

quality of service (QOS)

Condition QOS defines quality assurance aspects. The dependency is broken if the QOS factor is not met (even if the technical dependency is still alive).

The property can be fuzzy. The same principles described at fuzzy dependencies apply.

Response time must be under x seconds

Bandwidth must be sufficient to enable video streaming at a given quality

There has to be at least 3 copies of a file at different locations.

weight Ranking Gives a dependency connection a weight. Higher values have priority and the dependency is more important or stronger than others.

At this stage no statement how the weight is calculated can be made. This will be done later.

0..1 decimal numbers, a lower value means less important, a higher value means more important

Digital files have a strong dependency on the storage.

The technical availability of the end user interface is more important than the admin interface.

probability of change within a period

Prediction This represents the estimated probability of change.

The estimate is always based on a given time

Timeframe value: days, month, years with a value from 0..1 which

A new law is in discussion and could pass legislation within 1 year. After that change we are no longer allowed to store certain attributes. This has

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 65 / 93

Dependency Classification Description Values Example

frame (30 days, 3 years …) and for each time frame there could be a different probability of change. A typical process lifecycle shows that during the first iterations there are quite a lot of changes to be expected and after several months it settles and becomes more stable.

The method for calculating the weights will be specified in a later task.

expresses an estimation of change

impact on some relations.

Table 11:description of dependencies

Note on weight/importance and prediction:

For the classification it is just important that the type’s weight and importance can constrain a

dependency. The calculation of the weights will be discussed in future deliverables.

State of the art 4.5.3.

TIMBUS

The TIMBUS project aims to capture and preserve an existing business process. Thus the

dependencies defined are descriptive and help to model the context or environment of elements of

the process. There is a defined set of applicable classes [91, pp. 65–72]. TIMBUS annotates each

dependency with a constraint. There are about 40 defined constraints like hasAssociation,

hasConfiguration, hasCreator, hasEncryption, hasName, hasRedundancy and so on. [91, pp. 109–

110] defines also the inverse direction of the dependencies (e.g. hasFormat -> isFormat).

An example of such a dependency graph is shown in Error! Reference source not found..

Figure 7: example of TIMBUS vocabulary to describe the surrounding of a digital image

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 66 / 93

This view is different from the classification of dependencies in PERICLES. PERICLES considers not

only the direct interactions of digital objects with environment entities as in TIMBUS, but also the

relevant interactions between environment entities themselves,

Each relation from the TIMBUS catalogue (e.g. hasFormat) can be specified as a functional, transitive,

symmetric, asymmetric, reflexive or irreflexive dependency. Functional declares a relation only

between two entities. Transitive relations arise by inference (A depends on B, B depends on C, then A

is transitively dependent on C). Symmetric marks direct a dependency that is navigable in both

directions and asymmetric is only applicable in one direction. Reflexive denotes relations that can be

applied to an element itself and irreflexive are dependencies that cannot have relationship with

itself.

In PERICLES, we are primarily concerned with dependencies that are transitive, symmetric and

asymmetric (called bidirectional/unidirectional).

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 67 / 93

5. Modelling of the digital ecosystem

Chapter 4 has presented the generic types of entities, their properties and a description of

dependency and change factors. This chapter outlines the procedure for using the entities to model a

digital ecosystem, and specifically the creation of a dependency graph.

This chapter will take you through the steps that are necessary to build a graph. It will also introduce

the graphical notation of the dependency graph. Chapter 4 has described the elements of a graph,

but not the graphs themselves. Chapter 6 presents examples on how to model a digital ecosystem

with the application of this procedure.

5.1. Preparatory tasks As part of the modelling, user activities will be mapped to different systems and processes in an

organisation. Therefore it is necessary to know the structure of the system landscape. The system

landscape is a static view and gives an overview of all processes, actors, information systems and the

connection between the types. This is also known under the term Enterprise Architecture (EA)

modelling. If there is an existing plan then it can be reused. Otherwise it should be modelled. If there

is no model we recommend ArchiMate80, which is an EA modelling language. ArchiMate is an open

standard, overseen by the Open Group and has the benefit that it allows to model the processes and

structure of the whole organisation.

Alternatives are some UML diagram types, for example component and deployment diagrams.81

These UML diagram types are very software centric and do not take into account the business

activities of the organisation, so it is not the best way to model a system landscape.

Another alternative to UML for modelling a system landscape are some diagram types of SysML. It is

based on UML but was created for the purpose of systems engineering, which is broader than the

software centric UML. The block definition diagram is suited for modelling the structure of the

system. It allows to link blocks, which can be of different types – company organisation, hardware,

software, users, processes etc. The blocks can be very generic or specified more in detail. It is

possible to create inner block diagrams that show what happens inside blocks. But still it looks more

technical oriented.

The examples from chapter 6 use ArchiMate82. It helps to structure the user stories and map them to

the real entities. This deliverable will not go into further details on how to model a system landscape

with this graphical notation.

80

See http://www.opengroup.org/subjectareas/enterprise/archimate.. 81

See http://www.uml-diagrams.org/uml-25-diagrams.html. 82

an introduction to ArchiMate can be found on http://www.opengroup.org/archimate/2.0/ArchiMate2_intro.pdf, http://www.archimate.nl/content/bestanden/archimate_made_practical_2008-04-28.pdf and http://www.masteringarchimate.com.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 68 / 93

5.2. Identify user activities The first step is to make a list of current user communities that use the services or systems in

question. The list should then be reviewed to identify the main activities the users perform. A user

survey can support the creation of the list.

Another approach is to list all the current services and activities which are currently offered and

assign the services to the user community. This can also be done by producing a list of use cases that

are attributed to the different communities.

A ranking based on the usage frequency of the services is a good indication of how important a

certain service could be to a given user community.

Restructure activity to the user perspective if necessary 5.2.1.

The actions or user stories of a designated user community should be user-centric, thus should be

expressed from the user perspective. They should be in the form in which a user wants to perform a

functionality to get a value for his work.

Refine activity description 5.2.2.

The description must be detailed enough to cover relevant parts of the system. A short description

like “user group x needs a pie chart overview of categories” is not sufficient to understand the

activity in full detail. The activity must be detailed enough to map the steps to the systems which

enable production of the desired output.

A model of the system landscape expressed in ArchiMate which includes the business process and

the environments helps to structure the user activity description. Parts or the flow of the use case

can also be modelled with BPMN if a more detailed description is desired. Such a model shows how

the information flow is organised between the institution and technical services and which systems

and other information is involved.

5.3. Identify entity types It should now be possible to make a list of the involved entity types that are necessary to perform

the functionality of the user activity. The list contains the five different entity types (user, digital

object, technical service, policy, process). The instances of each entity type should be specified more

precisely, for example Technical Service: Storage system, web application; Digital Object: scanned

image, textual description and so on.

5.4. Model the dependencies At this stage it is possible to build the entity graph together with the preliminary steps that have

been done. The following procedure is suggested to create the graph:

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 69 / 93

Graphical notation 5.4.1.

We create a property graph. That means that vertices and the edges can have multiple properties. As

a template for the graphical notation a UML class diagram can be reused. This has the benefit that

common modelling tools, known as CASE tools, can be used for the graphical notation. We have no

classes, but instances of the five entity types. To identify the entity types the following shortcuts are

defined:

US User

PS Process

TS Technical Service

PO Policy

DO Digital Object

In addition to these shortcuts the following colours can be used to distinguish the different entity

types:

User purple for the designated user community, grey for any other internal user (e.g.

system operators)

Process yellow

Technical Service red

Policy green

Digital Object blue

The entities are linked to each other with lines and one or two arrows. One arrow marks

unidirectional relations, two arrows marks bidirectional relations. Any other property of a

dependency goes to the edge.

Vertex properties 5.4.2.

All entities need to be identified and the properties defined in chapter 4 captured for each entity

type. Some properties are name value pairs, for example name = XYZ, while other properties such as

sensitivity are a kind of category. E.g. it might be sensible to estimate how sensitive an entity is to

technology changes.

Edge properties 5.4.3.

The properties on the edge are a detailed description of the dependency of entity on another entity.

Chapter 4.5.2 contains a decision chart and a table with which the properties of the dependency can

be identified. It will result in a list of properties that form a name value pair (e.g. binding type = hard,

probability of change within 1 year = 0.2, etc.). If a dependency property is not applicable then it can

be omitted.

Edge properties for the entities user can be left out. Normally a user has a hard dependency on a

technical service because he wants to perform his user activity.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 70 / 93

5.5. Possible reasoning from the graph This is a task of the next deliverable, but this list contains a set of ideas which reasoning on the graph

could be made:

Single point of failure analyses: problematic entities can be found that have critical

dependencies and where a failure can have a serious impact for the digital ecosystem.

Reduction of possible side effects when an entity is changed.

Estimation of the lifetime of dependencies to identify problematic entities.

Comparison of digital ecosystems by comparing their graphs and using a metric for similarity.

Determining of transitive dependencies. Transitive dependencies are dependencies that are

not explicitly modelled and can arise by the coupling of elements. Whenever an element A

depends to element B and element B is in turn related to element C, then element A is also

related to element C.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 71 / 93

6. Exemplary application

This chapter takes the steps from chapter 5 to show the procedure on how to model the ecosystem

with a concrete example. The first example is a simplified abstraction of the space science scenario.

The space science data are produced by experiments executed on the SOLAR instrument (scientific

payload) on the ISS. The experiments are subject to external processes and influences such as

changes to other ISS systems and vehicles arriving. The running and monitoring of the experiments is

carried out by the mission operator at the ground station. The execution of the experiment is

governed by policies (flight rules). The raw science data and telemetry are captured and stored at the

ground station. The mission operators create logs that describe the execution of the experiments

and are related (but not explicitly linked) to the telemetry and data. The SOLAR scientists use the

data, telemetry and logs to perform calibrations and scientific analysis experiments.

Step 1: preliminary work

The system landscape of the scenario will be modelled with ArchiMate (Error! Reference source not

found.) and as a SysML block diagram (Error! Reference source not found.). This example will not

model the whole landscape of the ISS experiment and ground station functionality; it will be limited

to the parts that are necessary to perform the described functions.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 72 / 93

Figure 8: Science example landscape expressed with ArchiMate

Figure 9: Science example landscape expressed with a SysML block diagram

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 73 / 93

Step 2: identify user activities

In this scenario we can identify two types of users, the mission operator and the SOLAR scientists. As

chapter 5.2.1 describes we are interested on the activities that the designated user community

performs. In this case the mission operator is not part of the designated user community, but rather

an internal actor of the system. The mission operator will be modelled, but is considered as an

internal part of the system.

The designated user community for this example are SOLAR scientists that want to access the data

that has been produced by experiments on the ISS.

Step 3: restructure user activities

The space scientist is the user we are interested. Therefore we rewrite the activity from the view of a

space scientist:

A space scientist wants to access the data that has been produced by the experiments carried out on

the space station. The scientist wants to analyse the data and perform calibration experiments.

Step 4: refine activity description

In this step the user activities can be refined or expressed in more detail. For this introduction

example a short BPMN model of the user activity has been created. It shows that the scientist waits

for the data of the experiment, executes the calibrations and then decides if the data is sufficient for

his purpose. He then decides if a new experiment should be planned. In addition it is likely that the

research will document the results.

Figure 10: Scientist user activity expressed as BPMN

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 74 / 93

Step 5: identify entity types

It is now possible to identify the instances of the involved entities according to the scenario

description, user activity and architecture model:

Technical Service (TS)

● SOLAR instrument – generates data and telemetry.

● Store data and telemetry – database.

● Calibration experiments – the system the SOLAR scientist interacts with

● Mission control service – the frontend for the mission operator

Process (PS)

● Run/monitor experiment – execution of experiments on SOLAR instrument.

● Create logs – manual process of creating logs and reports.

● Calibration – calibration of science data.

● External factors - these are processes that have an impact on the data produced by the

instrument and may be out of direct control of the mission operator.

Policy (PO)

● A set of policies relating to execution of experiments on ISS, e.g. flight rules.

User (US)

● Mission operator.

● Scientist.

Digital Object (DO)

● Data and telemetry – science data, associated telemetry and telecommands.

● Logs – console logs and other reports created by the mission operator.

Step 6: model the dependencies

In step 5 the entities have been identified. Now the entities are put into a graph to model the

dependencies of the entities. Each entity is a vertex on the graph. A vertex has some attributes

depending on the entity type. Chapter 4.2 shows which attributes are necessary. The next task is to

identify the type of dependency between the entities, but first it must be determined if there is a

dependency. As an aid the activity description and the model of the system landscape can be used.

Chapter 4.5.2 describes which kind of dependencies exist and what values they need. The properties

are written beneath the edge.

Error! Reference source not found. shows the modelled dependencies and their properties. The

entities have been coloured as suggested in chapter 5.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 75 / 93

Figure 11: science dependencies expressed as property graph

6.1. First reasoning on the graph The following are examples of conclusions that can be drawn from analysis of the dependency graph.

● External factors on the instrument can have an impact on the science data calibration and

logs. There is a tolerable range wherein the experiments can be executed. External factors

can be the position of the ISS, position of the sun, ignition of boosters and many other

external processes.

● Changing the log creation process has no impact on the science data or telemetry.

● Changes to the process of running experiments can only be done in compliance with

appropriate policies.

● Changes in the calibration process have no impact on the data and telemetry and logs.

● Changes to the mission operator role can impact the creation of logs and the running and

monitoring of experiments. However, there is no impact on the flight rules.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 76 / 93

7. Conclusion, recommendations and

outlook for digital ecosystem management

This initial report on preservation ecosystem management introduced the notion of a digital

ecosystem as a theoretical framework in which preservation issues can be analysed. The report

provided an overview of the preservation relevant entities of an ecosystem, which dependencies

exist between them and what kind of change events they are subject to. These entities and their

dependencies can be represented as a dependency graph which can be analysed to identify

preservation related tasks and issues. A step by step guide introduced how to create a preservation

ecosystem model with its individual entities and with its different specific dependencies. Finally, a

few first low detail examples for these digital ecosystem models were presented.

The main benefit of using such ecosystem models and their dependency graph representation is that

it prevents the misconception of preservation as an isolated challenge limited to a single entity type

or dependency relation (usually digital objects and their technical obsolescence). It is also intended

to highlight that there are alternative mitigation strategies in this network of entities and

dependencies. These challenges and the options

On the theoretical level the digital ecosystem model approach will be developed further in a couple

of aspects: simultaneously to the creation to this report the first stable version of the linked resource

model (LRM) has been created. From the perspective of the digital ecosystem approach is the LRM

the formal graph representation of dependencies which is done in the digital ecosystem model on a

business or conceptual level. The next logical step is to create a procedure for translating digital

ecosystem models into LRM graphs. Then these LRM graphs can be analysed with formal methods

like graph theory to support the analysis and reasoning process.

It is important to make a few general remarks before the future technical work is explained. Digital

ecosystem models are not the blueprint for what a technical infrastructure or even a dedicated

preservation system should look like. Instead they are a snapshot of the current infrastructure and

situation irrespective of whether its aim is to preserve anything at all and irrespective of whether it is

effective or dysfunctional. A dedicated preservation system or a simple repository where digital

objects are preserved only as a matter of secondary importance can be part of this digital ecosystem

but that is not the initial purpose of the digital ecosystem models. The initial purpose of the digital

ecosystem approach is to provide a way to manage them.

But of course the management of a digital ecosystem can be supported by tools and a preservation

system or infrastructure can take the digital ecosystem approach into account. A key component that

will be developed in the coming months is a registry in which the dependency graph representations

of digital ecosystem models can be stored and queried. This is important for example for the efficient

analysis of dependency graphs or the comparison of different versions of a preservation

infrastructure. Central in this registry will be the representation of the dependencies between

policies, the processes which implement the policies and the technical service which again

implement the processes. This allows for example to track and plan the evolution of a preservation

digital ecosystem and verify that the infrastructure was in the past and in the present compliant to

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 77 / 93

the policies. Based on a knowledge base of existing policies and processes a policy editor will be

created to help institution to create and manage preservation policies.

The second major component is a process engine which can operate on the process descriptions in

the registry, i.e. to use them as an input workflow which is executed or to output a dependency

graph representation of a workflow in another format. This can enable a very flexible infrastructure

driven by dependency and change management. The aim is to have a testbed where new or historic

infrastructure versions can actually run and be compared. One application would be to use

dependency graph analysis methods to evaluate systems or infrastructure whether they have

beneficial dependency graph properties. And this is the part where we actually come close again to a

preservation system in a more traditional OAIS sense. But by explicitly binding a preservation system

to the dependency management in a preservation digital ecosystem registry the life cycle of the

preservation system itself becomes manageable and advances the current state of the art of

traditional preservation systems.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 78 / 93

8. Bibliography [1] D. Leffingwell, ‘A User Story Primer’, 17-Dec-2009. [Online]. Available: http://agile.dzone.com/articles/user-story-

primer. [Accessed: 31-Jul-2014]

[2] S. Abrams, P. Cruse, and J. Kunze, ‘Preservation is not a place’, Int. J. Digit. Curation, vol. 4, no. 1, pp. 8–21, 2009.

[3] C. Becker, G. Antunes, J. Barateiro, R. Vieira, and J. Borbinha, ‘Modeling digital preservation capabilities in enterprise architecture’, in Proceedings of the 12th Annual International Digital Government Research Conference: Digital Government Innovation in Challenging Times, 2011, pp. 84–93 [Online]. Available: http://www.ifs.tuwien.ac.at/~becker/pubs/becker-dgo2011.pdf

[4] CCSDS, ‘REFERENCE MODEL FOR AN OPEN ARCHIVAL INFORMATION SYSTEM (OAIS)’, Jun-2012. [Online]. Available: http://public.ccsds.org/publications/archive/650x0m2.pdf. [Accessed: 03-Jul-2014]

[5] D. Rosenthal, ‘The Half-Life of Digital Formats’, 24-Oct-2010. [Online]. Available: http://blog.dshr.org/2010/11/half-life-of-digital-formats.html. [Accessed: 30-Jul-2014]

[6] A. Jackson, ‘Format Obsolescence and Sustainable Access’, 13-Jan-2011. [Online]. Available: http://www.openplanetsfoundation.org/node/592. [Accessed: 30-Jul-2014]

[7] G. Antunes, D. Proença, J. Barateiro, R. Vieira, J. Borbinha, and C. Becker, ‘Assessing Digital Preservation Capabilities Using a Checklist Assessment Method’, in 9th International Conference on Preservation of Digital Objects (iPRES 2012), pg, 2010, pp. 266–273 [Online]. Available: http://www.scape-project.eu/wp-content/uploads/2012/11/iPres2012_Assessing.pdf

[8] G. Antunes, J. Barateiro, C. Becker, D. Proença, and R. Vieira, ‘SHAMAN REFERENCE ARCHITECTURE’. 12-Jan-2012 [Online]. Available: http://shaman-ip.eu/sites/default/files/SHAMAN-REFERENCE%20ARCHITECTURE-Final%20Version_0.pdf

[9] ‘About BenchmarkDP’. [Online]. Available: http://benchmark-dp.org/. [Accessed: 30-Jul-2014]

[10] E. Conway, B. Matthews, D. Giaretta, S. Lambert, M. Wilson, and N. Draper, ‘Managing Risks in the Preservation of Research Data with Preservation Networks’, Int. J. Digit. Curation, vol. 7, no. 1, pp. 3–15, Mar. 2012.

[11] Y. Tzitzikas, ‘Dependency Management for the Preservation of Digital Information’. 2007 [Online]. Available: http://users.ics.forth.gr/~tzitzik/publications/Tzitzikas_2007_DEXA.pdf

[12] DOI Foundation, ‘DOI Handbook Glossary of Terms’, 17-Jun-2013. [Online]. Available: http://www.doi.org/doi_handbook/Glossary.html. [Accessed: 03-Jul-2014]

[13] JISC, ‘Definition of Digital Object’, 2010. [Online]. Available: http://www.webarchive.org.uk/wayback/archive/20130726092422/http://blogs.ukoln.ac.uk/jisc-beg-dig-pres/content/what-is-digital-preservation/definition-of-digital-object/ originally on http://blogs.ukoln.ac.uk/jisc-beg-dig-pres/content/what-is-digital-preservation/definition-of-digital-object. [Accessed: 03-Jul-2014]

[14] K. Thibodeau, ‘Overview of Technological Approaches to Digital Preservation and Challenges in Coming Years1’, 2002. [Online]. Available: http://www.clir.org/pubs/reports/pub107/thibodeau.html. [Accessed: 03-Jul-2014]

[15] M. Seadle, ‘Archiving in the networked world: authenticity and integrity’, Libr. Hi Tech, vol. 30, no. 3, pp. 545–552, 2012.

[16] Jisc, ‘The significant properties of digital objects’, 2008. [Online]. Available: http://www.jisc.ac.uk/whatwedo/programmes/preservation/2008sigprops. [Accessed: 25-Jul-2014]

[17] A. Dappert and A. Farquhar, ‘Significance is in the eye of the stakeholder’. 2009 [Online]. Available: http://www.planets-project.eu/docs/papers/Dappert_Significant_Characteristics_ECDL2009.pdf

[18] C. Zierau, ‘Package Formats for Preserved Digital Material’, presented at the iPres2012, Toronto, ON, Canada, 2013, pp. 54–62 [Online]. Available: https://ipres.ischool.utoronto.ca/sites/ipres.ischool.utoronto.ca/files/iPres%202012%20Conference%20Proceedings%20Final.pdf

[19] K. Shafer, S. Weuvek, E. Jul, and J. Fausey, ‘Introduction to Persistent Uniform Resource Locators’, 1996. [Online]. Available: https://www.isoc.org/inet96/proceedings/a4/a4_1.htm. [Accessed: 03-Jul-2014]

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 79 / 93

[20] P. Dragos-Paul, ‘Natural versus Surrogate Keys. Performance and Usability’, Database Syst., vol. vol. II, no. no. 2/2011, pp. 55–63, 2011.

[21] P. J. Leach, M. Mealling, and R. Salz, ‘RFC4122: A universally unique identifier (uuid) urn namespace’, 2005 [Online]. Available: https://tools.ietf.org/html/rfc4122

[22] A. Dappert and S. Peyrard, ‘Describing Digital Object Environments in PREMIS’, in Proceedings of the 9th International Conference on Preservation of Digital Objects (iPRES2012), 2012, pp. 69–76 [Online]. Available: http://fclaweb.fcla.edu/uploads/iPres2012_Environments_v14.pdf

[23] ‘Fedora: an architecture for complex objects and their relationships’, Proc. Jt. Conf. Digit. Libr., vol. vol.6, no. no.2, pp. 1432–5012, 2006.

[24] R. Gartner, ‘METS as an’Intermediary’Schema for a Digital Library of Complex Scientific Multimedia’, Inf. Technol. Libr., vol. 31, no. 3, pp. 24–35, 2012.

[25] T. Huang, ‘MPEG-21 Digital Item Declaration (DID)’, 2011. [Online]. Available: http://mpeg.chiariglione.org/standards/mpeg-21/digital-item-declaration. [Accessed: 03-Jul-2014]

[26] ‘policy’, Merriam-Webster. [Online]. Available: http://www.merriam-webster.com/dictionary/policy. [Accessed: 03-Jul-2014]

[27] D. Agrawal, Ed., Policy technologies for self-managing systems. Upper Saddle River, NJ: IBM Press/Pearson plc, 2009.

[28] Object Management Group, ‘Object Constraint Language Version 2.4’. Feb-2014 [Online]. Available: http://www.omg.org/spec/OCL/2.4. [Accessed: 03-Jul-2014]

[29] Jorge Lobo, R. Bhatia, and S. Naqvi, ‘A Policy Description Language’. 1999 [Online]. Available: https://www.aaai.org/Papers/AAAI/1999/AAAI99-043.pdf. [Accessed: 03-Jul-2014]

[30] N. Damianou, N. Dulay, E. Lupu, and M. Sloman, ‘The Ponder Policy Specification Language’. 2001 [Online]. Available: http://pdf.aminer.org/000/545/721/the_ponder_policy_specification_language.pdf. [Accessed: 03-Jul-2014]

[31] K. Twidle, D. Naranker, L. Emil, and M. Sloman, ‘Ponder2: A Policy System for Autonomous Pervasive Environment’. [Online]. Available: https://spiral.imperial.ac.uk/bitstream/10044/1/4335/1/ponder2.pdf. [Accessed: 03-Jul-2014]

[32] OASIS, ‘eXtensible Access Control Markup Language (XACML) Version 3.0’, 22-Jan-2013. [Online]. Available: http://docs.oasis-open.org/xacml/3.0/xacml-3.0-core-spec-os-en.html. [Accessed: 03-Jul-2014]

[33] L. Kagal, ‘Rei Ontology Specifications, Ver 2.0’. [Online]. Available: http://www.csee.umbc.edu/~lkagal1/rei/. [Accessed: 03-Jul-2014]

[34] DMTF, ‘CIM Simplified Policy Language (CIM-SPL)’. 14-Jul-2009 [Online]. Available: http://www.dmtf.org/sites/default/files/standards/documents/DSP0231_1.0.0.pdf. [Accessed: 03-Jul-2014]

[35] H. Boley and M. Kifer, ‘RIF Basic Logic Dialect (Second Edition)’, 05-Feb-2013. [Online]. Available: http://www.w3.org/TR/2013/REC-rif-bld-20130205/. [Accessed: 03-Jul-2014]

[36] N. Beagrie, N. Semple, P. Williams, and R. Wright, ‘DIGITAL PRESERVATION POLICIES STUDY, Part 1: Final Report October 2008’. Oct-2008 [Online]. Available: http://www.jisc.ac.uk/media/documents/programmes/preservation/jiscpolicy_p1finalreport.pdf. [Accessed: 03-Jul-2014]

[37] ERPANET, ‘Digital Preservation Policy Tool’. Sep-2003 [Online]. Available: http://www.erpanet.org/guidance/docs/ERPANETPolicyTool.pdf. [Accessed: 03-Jul-2014]

[38] Research Councils UK, ‘RCUK Common Principles on Data Policy’. [Online]. Available: http://www.rcuk.ac.uk/research/datapolicy/. [Accessed: 03-Jul-2014]

[39] N. Hall, W. Mann, B. Corey, and T. Wilson, ‘Model Language for Research Data Management Policies’, 2013. [Online]. Available: http://www.aserl.org/wp-content/uploads/2013/01/ASERL-SURA_Model_Language_RDM_Policy_Language_FINAL.pdf. [Accessed: 03-Jul-2014]

[40] A. Dappert, B. Ballaux, P. Bright, M. Mayr, and S. van Bussel, ‘Report on policy and strategy modelsfor libraries, archives and data centres’. 09-Jul-2009 [Online]. Available: http://planets-project.eu/docs/reports/Planets_PP2_D3_ReportOnPolicyAndStrategyModelsM36_Ext.pdf. [Accessed: 03-Jul-2014]

[41] G. V. Elstrøm and B. Sierman, ‘Guidance Policy Elements and Preservation Procedure Elements’. 16-May-2014 [Online]. Available: http://wiki.opf-labs.org/display/SP/Policy+Elements

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 80 / 93

[42] S. Bechhofer, B. Sierman, C. Jones, and G. Elstrøm, ‘D13.1 Final version of Policy specification model’. Jul-2013 [Online]. Available: http://www.scape-project.eu/wp-content/uploads/2013/08/SCAPE_D13.1_UNIMAN_V1.0.pdf. [Accessed: 03-Jul-2014]

[43] UK Data Archive, ‘PRESERVATION POLICY’. 23-Oct-2012 [Online]. Available: http://data-archive.ac.uk/media/54776/ukda062-dps-preservationpolicy.pdf. [Accessed: 25-Jul-2014]

[44] W. Arsac, A. Laube-Rosenpflanzer, A. Lioy, B. Gallego-Nicasio, and Basile, ‘D1.3 - CONCEPT AND ARCHITECTURE OF THE OVERALL SOLUTION’. 16-Feb-2013 [Online]. Available: http://www.posecco.eu/fileadmin/POSECCO/user_upload/deliverables/D1.3_Architecture_v2.0.pdf. [Accessed: 03-Jul-2014]

[45] K. Julisch, R. Breu, M. Farwick, F. Innerhofer-Oberperfler, M. Brunner, and G. Karjoth, ‘D2.1 A Framework for Business Level Policies’. 26-Sep-2011 [Online]. Available: http://www.posecco.eu/fileadmin/POSECCO/user_upload/deliverables/D2.1_Framework_for_Business_Level_Policies_01.pdf. [Accessed: 03-Jul-2014]

[46] S. Paraboschi, M. Arrigoni Neri, S. Mutti, M. Guarnieri, E. Magri, and M.-N. Lepareaux, ‘D2.4 – POLICY HARMONIZATION AND REASONING’. 25-Sep-2012 [Online]. Available: http://www.posecco.eu/fileadmin/POSECCO/user_upload/deliverables/D2.4_Policy_Harmonization_and_reasoning.pdf. [Accessed: 03-Jul-2014]

[47] T. Scholte, M. Arrigoni Neri, and S. Mutti, ‘D3.5 – MODELS TO REFINE THE IT POLICY AT SERVICE LEVEL’. 30-Sep-2012 [Online]. Available: http://www.posecco.eu/fileadmin/POSECCO/user_upload/deliverables/D3.5_Models_to_refine_the_IT_policy_at_service_level_01.pdf. [Accessed: 03-Jul-2014]

[48] M. Albani, V. Beruti, M. Duplaa, and C. Giguere, ‘Long Term Preservation of Earth Observation Space Data European LTDP Common Guidelines’. 04-Jun-2009 [Online]. Available: http://earth.esa.int/gscb/ltdp/EuropeanLTDPCommonGuidelines_DraftV2.pdf. [Accessed: 03-Jul-2014]

[49] ‘iRODS Micro-services’, 12-Dec-2011. [Online]. Available: https://wiki.irods.org/index.php?title=iRODS_Micro-services&oldid=6276. [Accessed: 03-Jul-2014]

[50] ‘XACML Policy Enforcement’, 30-Aug-2012. [Online]. Available: https://wiki.duraspace.org/display/FEDORA37/XACML+Policy+Enforcement. [Accessed: 03-Jul-2014]

[51] S. Waddington, J. Zhang, G. Knight, J. Jensen, R. Downing, and C. Ketley, ‘Cloud repositories for research data – addressing the needs of researchers’, J. Cloud Comput. Adv. Syst. Appl., vol. 2, no. 1, p. 13, 2013.

[52] ‘process’, Cambridge Dictionaries Online. Cambridge University Press [Online]. Available: http://dictionary.cambridge.org/dictionary/british/process_1. [Accessed: 03-Jul-2014]

[53] ‘process’, Oxford Dictionaries. Oxford University Press [Online]. Available: http://www.oxforddictionaries.com/definition/english/process. [Accessed: 03-Jul-2014]

[54] ‘workflow’, Cambridge Dictionaries Online. Cambridge University Press [Online]. Available: http://dictionary.cambridge.org/dictionary/british/workflow. [Accessed: 03-Jul-2014]

[55] ‘workflow’, Oxford Dictionaries. Oxford University Press [Online]. Available: http://www.oxforddictionaries.com/definition/english/workflow. [Accessed: 03-Jul-2014]

[56] J. B. Hill, ‘Do You Understand the Difference Between Workflow and BPM?’, 22-Apr-2010. [Online]. Available: http://blogs.gartner.com/janelle-hill/2010/04/22/do-you-understand-the-difference-between-workflow-and-bpm/. [Accessed: 03-Jul-2014]

[57] K. D. Swenson, ‘Seven Categories to Replace BPM’, 25-Apr-2012. [Online]. Available: http://social-biz.org/2012/04/25/not-to-praise-bpm-but-to-bury-it/. [Accessed: 03-Jul-2014]

[58] E. Deelman, D. Gannon, M. Shields, and I. Taylor, ‘Workflows and e-Science: An overview of workflow system features and capabilities’, Future Gener. Comput. Syst., vol. 25, no. 5, pp. 528–540, May 2009.

[59] M. Sonntag, D. Karastoyanova, and E. Deelman, ‘Bridging the Gap between Business and Scientific Workflows: Humans in the Loop of Scientific Workflows’, 2010, pp. 206–213 [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5693919. [Accessed: 02-Jul-2014]

[60] I. J. Taylor, Workflows for e-science scientific workflows for grids. London: Springer, 2007 [Online]. Available: http://public.eblib.com/EBLPublic/PublicView.do?ptiID=337445. [Accessed: 02-Jul-2014]

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 81 / 93

[61] IBM, Data Processing Techniques: Flowcharting Techniques. 1969 [Online]. Available: http://www.fh-jena.de/~kleine/history/software/IBM-FlowchartingTechniques-GC20-8152-1.pdf

[62] G. Callou, P. Maciel, D. Tutsch, J. Arajo, J. Ferreira, and R. Souz, ‘A Petri Net-Based Approach to the Quantification of Data Center Dependability’, in Petri Nets - Manufacturing and Computer Science, P. Pawlewski, Ed. InTech, 2012 [Online]. Available: http://www.intechopen.com/books/petri-nets-manufacturing-and-computer-science/a-petri-net-based-approach-to-the-quantification-of-data-center-dependability. [Accessed: 02-Jul-2014]

[63] K. Fakhroutdinov, ‘State Machine Diagrams’. [Online]. Available: http://www.uml-diagrams.org/state-machine-diagrams.html. [Accessed: 03-Jul-2014]

[64] K. Fakhroutdinov, ‘Activity Diagrams’. [Online]. Available: http://www.uml-diagrams.org/activity-diagrams.html. [Accessed: 03-Jul-2014]

[65] K. Fakhroutdinov, ‘Use Case Diagrams’. [Online]. Available: http://www.uml-diagrams.org/use-case-diagrams.html. [Accessed: 03-Jul-2014]

[66] R. Bruni, ‘Methods for the specification and verification of business processes: 19 - Event-driven process chains’, 05-Dec-2011 [Online]. Available: http://www.cli.di.unipi.it/~rbruni/MPB-12/19-EPC.pdf. [Accessed: 03-Jul-2014]

[67] OMG, ‘BPMN 2.0 by Example: Version 1.0 (non-normative)’. Jun-2010 [Online]. Available: http://www.omg.org/spec/BPMN/2.0/examples/PDF/10-06-02.pdf. [Accessed: 03-Jul-2014]

[68] Computer Systems Laboratory of the National Institute of Standards and Technology, ‘Integration Definition for Function Modeling (IDEF0)’. 21-Dec-1993 [Online]. Available: http://www.idef.com/pdf/idef0.pdf. [Accessed: 03-Jul-2014]

[69] N. Russell, A. H. M. ter Hofstede, and W. M. P. van der Aalst, ‘new YAWL: Specifying a Workflow Reference Language using Coloured Petri Nets’. [Online]. Available: http://www.yawlfoundation.org/sites/default/files/newYAWL-cpn.pdf. [Accessed: 03-Jul-2014]

[70] S. Strodl, K. Hobel, E. Weigl, and T. Miksa, ‘D4.6: Use Case Specific DP & Holistic Escrow’. 31-Mar-2013 [Online]. Available: http://timbusproject.net/component/docman/doc_download/143-d46m24use-case-specific-dp-a-holistic-escrowpdf. [Accessed: 03-Jul-2014]

[71] P. May and C. Wilson, ‘D2.3 Technical Architecture Report Version 2’. Mar-2014 [Online]. Available: http://www.scape-project.eu/wp-content/uploads/2014/05/SCAPE_D2.3_BL_V1.0.pdf. [Accessed: 25-Jul-2014]

[72] The Open Group, ‘The Open Group Architecture Framework (TOGAF) - Core Concepts’, 2009. [Online]. Available: http://www.togaf.org/togaf9/chap02.html

[73] M. Walker, ‘ArchiMate 2.0 Highlights’, 05-Feb-2012. [Online]. Available: http://www.mikethearchitect.com/2012/02/archimate-20-highlights.html. [Accessed: 25-Jul-2014]

[74] ‘How LOCKSS Works’. [Online]. Available: http://www.lockss.org/about/how-it-works/. [Accessed: 25-Jul-2014]

[75] Kavoussanakis, Manzano, Dima, Baxter, and Håkansson, ‘Introduction to EUDAT for Community Data Managers and the general public’, 27-Jun-2014. [Online]. Available: http://eudat.eu/EUDAT%2BPrimer.html. [Accessed: 25-Jul-2014]

[76] ‘change’, Oxford Dictionaries. Oxford University Press [Online]. Available: http://www.oxforddictionaries.com/definition/english/change. [Accessed: 04-Jul-2014]

[77] K. Duretec, L. Faria, P. Petrov, and C. Becker, ‘D12.1 Identification of triggers and preservation Watch component architecture, subcomponents and data model’. 27-Jan-2012 [Online]. Available: http://www.scape-project.eu/wp-content/uploads/2012/01/SCAPE_D12.1_TUW_V1.0.pdf. [Accessed: 03-Jul-2014]

[78] The National Archives Kew, ‘Managing digital continuity’. [Online]. Available: https://www.nationalarchives.gov.uk/information-management/manage-information/policy-process/digital-continuity/. [Accessed: 03-Jul-2014]

[79] The National Archives Kew, ‘Risk assessment’. [Online]. Available: https://www.nationalarchives.gov.uk/information-management/manage-information/policy-process/digital-continuity/risk-assessment/. [Accessed: 03-Jul-2014]

[80] The National Archives Kew, ‘Risk Assessment Handbook’. 2011 [Online]. Available: https://www.nationalarchives.gov.uk/documents/information-management/Risk-Assessment-Handbook.pdf. [Accessed: 03-Jul-2014]

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

© PERICLES Consortium Page 82 / 93

[81] A. Ball, ‘Review of Data Management Lifecycle Models (version 1.0)’. 2012 [Online]. Available: http://opus.bath.ac.uk/28587/1/redm1rep120110ab10.pdf. [Accessed: 03-Jul-2014]

[82] Committee on Earth Observation Satellites, ‘Data Life Cycle Models and Concepts CEOS Version 1.2’. 2012 [Online]. Available: http://www.ceos.org/images/DSIG/Data%20Lifecycle%20Models%20and%20Concepts%20v13.docx. [Accessed: 03-Jul-2014]

[83] Digital Curation Centre, ‘DCC Curation Lifecycle Model’. [Online]. Available: http://www.dcc.ac.uk/resources/curation-lifecycle-model. [Accessed: 03-Jul-2014]

[84] M. Bragg, K. Hanna, L. Donovan, G. Hukill, and A. Peterson, ‘THE WEB ARCHIVING LIFE CYCLE MODEL’. 2013 [Online]. Available: https://archive-it.org/static/files/archiveit_life_cycle_model.pdf. [Accessed: 03-Jul-2014]

[85] S. Bechhofer, S. Soiland-Reyes, K. Belhajjame, and J. Bhagat, ‘D2.1 Workflow Lifecycle Management Initial Requirements’. 26-May-2011 [Online]. Available: http://repo.wf4ever-project.org/Content/12/D2.1.pdf. [Accessed: 03-Jul-2014]

[86] UK Data Archive, ‘Research Data Lifecycle’. [Online]. Available: http://data-archive.ac.uk/create-manage/life-cycle. [Accessed: 03-Jul-2014]

[87] A. Pepe, M. Mayernik, C. L. Borgman, and H. Van de Sompel, ‘From artifacts to aggregations: Modeling scientific life cycles on the semantic Web’, J. Am. Soc. Inf. Sci. Technol., p. n/a–n/a, 2009.

[88] ‘ISO 15489-1:2001 Information and Documentation - Records Management - Part 1: General’. .

[89] The Ohio State University Libraries, ‘Records Lifecycle’, 24-Jul-2013. [Online]. Available: https://library.osu.edu/projects-initiatives/osu-records-management/records-management-overview/records-lifecyle/. [Accessed: 25-Jul-2014]

[90] Z. Ma, Fuzzy database modeling with XML. New York: Springer Science+Business Media, 2005.

[91] J. Thomson, P. Trezentos, M. Romão, and R. Teixeira, ‘D4.2: Dependency Models Iter. 1: Definition of a formalism to express dependencies and relations between technological, business and organizational components and processes’. 30-Mar-2012 [Online]. Available: http://timbusproject.net/component/docman/doc_download/44-d42m12dependencymodelsiter1pdf. [Accessed: 03-Jul-2014]

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

CONFIDENTIAL © PERICLES Consortium Page 83 / 93

9. Annex A: Entity descriptions

Dependencies Policy (PO) Technical Service (TS) Process (PS) Digital Object (DO) User (US)

User is dependent on/ uses

● The policies express the view of the system for the designated user community or give a guideline what the system should achieve so that the user has a benefit. Requirements and expectations from users can influence policies.

● In addition policies can also be used to constrain the user community (e.g. which community has access to what, who can see which item).

● A Technical Service is used by a user.

● It forms a gateway to all other entity types because the user has to use Technical Service to access the entities.

● Executes a process and can be also part of the process (user input).

● A User may be an actor of the process, e.g. the preservation staff must manually enter metadata.

● A Technical Service allows the user to modify, create and delete DOs. A user can only operate with DOs by the use of a Technical Service.

● A user has certain expectations on the presentation of DOs. This includes translation of the semantics so that the user can understand a DO.

● may depend on knowledge of other designated user communities

Digital Object has a dependency on

● Policies manage the behaviour of DO.

● Policies can describe the meaning of a DO.

● Policies can be a DO.

● A Technical Service is used to render, store and modify a Digital Object.

● Details of the Technical Service used to process a Digital Object may be stored in a Digital Object in order to capture its provenance.

● A process processes Digital Objects during process execution. Processed means read, modified, written.

● Also a process is itself a Digital Object.

● It can depend on other Digital Objects that represent related information. The information can be hierarchical class orientated.

● Access and manipulates Digital Objects by the usage of different Technical Services.

Technical Service has a dependency on

● Policies define the behaviour of the system

● Policies are implemented with a Technical Service.

● Policies are executed on a Technical Service.

● Depends on other Technical Services like hard and software

● Depends on other services (e.g. web services, linked data, interfaces)

● A Process is executed inside a Technical Service.

● Technical Services can be exposed for external consumption, so a process can access other Technical Services.

● The implementation of Digital Objects depends on hardware and software or services.

● A Technical Service renders / processes DO in a user usable form and can apply transformations based on the background knowledge and semantics of the user.

● A Technical Service is used to store a non-volatile version of Digital Objects

● The user can’t interact with Digital Objects directly. A user uses Technical Services to get access to the digital resources.

● The user has an indirect influence on the properties of the systems according to his requirements.

● The user expects a certain behaviour and look and feel of the system based on their knowledge, habits and

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

CONFIDENTIAL © PERICLES Consortium Page 84 / 93

Dependencies Policy (PO) Technical Service (TS) Process (PS) Digital Object (DO) User (US)

terminology.

Process has a dependency on

● Static role: policies describe the overall flow and the steps of a process. Hence processes are derived by the policies or at least a process flow can be generated.

● Dynamic role: during process execution policies can be invoked. They can constrain the process or give processing guidelines during runtime. In this case policies act like a knowledge base.

● A Technical Service is used to implement a process or part of a process.

● A Process runs on a Technical Service and can provide a Technical Service (e.g. a workflow that exposes an interface as web service).

● The process definition consists of steps (activities) and a plan how and when these steps are performed. A step can be an atomic task or contain a sub-process.

● The execution of a process depends on a trigger. It may be continuous, or it may be triggered in response to an external event or needs interaction with the actors.

● A process creates or modifies Digital Objects, either as an outcome or as a side-effect.

● A machine executable artefact (in this case a process) is itself a Digital Object.

● A user interacts with processes via Technical Services.

● A user can also be part of a process (actor) or the one who executes a process.

● The processes can be modified if the user has new requirements, so a user indirectly defines the process via the requirements (at least some parts of the process).

Policy has a dependency on

● Policies can be hierarchically grouped (parent child relations)

● Meta-policies can manage the activation or deactivation of policies

● The use of a Technical Service may be governed by a policy or may restrict the use. Policies can define the overall behaviour of a Technical Service.

● During runtime policies can be queried as a rule base for making decisions.

● A process can be a derivation of a policy, they describe the formal process chain. The policies act as a knowledge base that can be queried during process run.

● A process may include a statement of whom or what performs each step in the process (the actors).

● Policies define a general rules on how to process the DO.

● A policy can constrain a DO. ● Machine executable policies

are themselves DO.

● Policies define the overall system behaviour. This includes what the user can do with the system, permission handling, the way how information is presented.

● Policies are indirectly derived by user (community) due to user requirements.

Table 12: Comparison of dependencies of the five entity types Digital Object, User, Technical Service, Process and Policy

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

CONFIDENTIAL © PERICLES Consortium Page 85 / 93

Types of change Policy (PO) Technical Service (TS) Process (PS) Digital Object (DO) User (US)

User is subject to change or can trigger a change due to

● A new user community is adding or using data with different requirements for their data. It can lead to policy changes to meet the requirements.

● Evolving of semantics, terminology and knowledge: these are typically specific to user communities and are required in the interpretation and understanding of the content. They are subject to change over time Significant changes lead to changes in policy.

● the user using the service may change and thus the expectation to what the Technical Service should do for the user change: - different skillset - knowledge base - method of use, new customs (eBook reader, podcasts …) - semantic changes

● If semantics, terminology and knowledge of the designated user community changes, it may be necessary to modify processes.

● New requirements and habits can have an influence the modelling of processes.

● A user can create, modify or retire DO via a Technical Service.

● The background knowledge, semantic or interest changes and a user cannot longer make use of the object.

● There might be side effects that originate from changes by a different user community if the representation or semantics off a DO has been changed..

● The user is removed from the system

● The role of the user changes ● The designated user

community can change over time (knowledge and terminology change)

● A user has new requirements (new objects, attributes, processes, new terminology, dissemination standards)

A change in Digital Objects can cause a change to entity ...

● Altered or new DO types can require new or changed policies.

● The scale of the objects to be processed by the system/service increases

● Format changes of DO

● A process processes Digital Objects. If the structure of a DO is different (new attributes, metadata) it can cause that the objects cannot be processed anymore.

● Changes in dependant DOs (attributes, semantics) can lead to a chain of changes to other dependant DOs.

● The Digital Object needs to adopt new requirements or semantic changes. If the representation of the Digital Object is changed without the involvement of the designated community it may cause a problem in understanding the Digital Object.

A change in Technical Services can cause a change to entity …

● Changes in the system topology can lead to changes in policy.

● New content types are added that were not envisaged when the policy was created.

● If new technical services become available (e.g. metadata extraction) or new applications are introduced then there might be a policy change to adopt these.

● Also the service has an environment like the operating system. The interfaces can change when a new OS installed and this can cause that the system stops working

● Hardware can become obsolete or defect.

● Other Technical Services are accessed in the process steps. If the interface changes then the process may fail.

● Underlying hard and software requires changes in the Digital Objects (format)

● New metadata extraction tools could result in improvements to existing algorithms or the extraction of new or updated metadata fields.

● New dependencies between Digital Objects could be extracted or existing ones enhanced.

● To perform actions on Digital Objects the user has to use Technical Services. A change can be from both directions because if there is a change at a Technical Service due to other reasons it can influence the way how the user interacts with Technical Services.

A change in a Process can

● Standards or processes used in the system become obsolete or are no longer supported.

● System/service used in a process changes its interface

● System/service used in a

● The process can be performed by different actors.

● The process is now a sub

● A change in the process chain that affects either the processing of a DOs or the use

● The user is an actor in several processes. Changes can affect the role of the user

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

CONFIDENTIAL © PERICLES Consortium Page 86 / 93

Types of change Policy (PO) Technical Service (TS) Process (PS) Digital Object (DO) User (US)

cause a change to entity ...

Policies that cover these areas need to be retired.

● Processes and DO are changed and the old policies do not apply anymore and must be changed.

process changes its prerequisite systems/services

● The process that uses a system or service to implement the functionality is changed

process of other processes. ● A process may become

obsolete if the process flow totally changes due to other Technical Services or Digital Objects.

of DOs. in the process. Or the process needs to be changed due to new requirements or semantic change.

A change in Policy can cause a change to entity ...

● Policy is updated according to scheduled policy review

● A policy can expire according to a schedule or event

● Meta-policies can manage policies.

● Policies which govern the use of particular systems or services may change.

● Policies may change the interaction and architecture of Technical Services.

● Processes are derived from policies so changes in policies can affect the process flow.

● Policies may applied during process execution.

● Policies may define user roles and access rights.

● Interaction and processing of DOs.

● Semantics of a DO if policies are used to describe the outline and functionality of a DO.

● Legal and licensing arrangements for Digital Objects.

● Changes from the designated user can lead to modifications of policies or new policies.

● Changes in policies can influence what the user can do and see in the system.

External changes

(other factors that can lead to a change)

● A change in strategy of the data holder or data owner requires a change in policy.

● Legal changes or changes in governance could lead to policy changes.

● Updated data standards or processes are available requiring a change in policy to implement them in the organisation.

● The archive accepts content types that were not envisaged when the policy was created.

● A change in risk profile of the owners or user communities requires a change in policy.

● System/service used in a process is no longer supported by the vendor or is no longer available

● Maintenance becomes prohibitively expensive

● New technical services become available (e.g. metadata extraction) and there is a policy change to adopt these.

● A technical service/software used in a process is no longer supported by the vendor

● Updates to data quality assurance standards or requirements, or more stringent requirements from new applications result in additional or updated quality assurance processes

● A new (improved) process is defined to replace the existing process with the aim(s) that the process is cheaper to perform, the process is quicker to perform, the process is more flexible in that it produces additional or more useful products.

● Quality assurance methods applied to the Digital Objects.

● Storage requirements (e.g. replication).

● Metadata requirements (e.g. provenance).

● Data representation standards, packaging or exchange formats (including e.g. preservation actions).

● Data owners can sometimes have a direct influence or dictate data (or data collection) to specific policies.

Table 13: Comparison of change of the five entity types Digital Object, User, Technical Service, Process and Policy

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

CONFIDENTIAL © PERICLES Consortium Page 87 / 93

Lifecycles Policy (PO) Technical Service (TS) Process (PS) Digital Object (DO) User (US)

Planning and definition

● The policy is planned and defined according to the requirements and goals of the preservation system and the designated user community.

● Here the systems and services including all dependencies to processes, Digital Objects, policies and users are defined. Several systems can be involved. So in this phase a system architecture plan is created.

● The input and desired output or goals are defined. To produce the output the necessary process steps, process flow and systems needs to be identified. Policies can be used to describe the formal process.

● In this phase the initial structure of the DO and its relations is modelled according to the requirements for the DO and according to the requirements of other entities.

● In this step a designated user community or several designated user communities are defined. It is important to understand their requirements, terminology and background knowledge. It is not only important to understand the needs, but also to write it down in a structured way.

Modelling ● The modelling passes several iterations and transformations. A high level formal policy is mapped to the entities and later to a more formal expression (e.g. with a structured vocabulary).

● Detailed modelling of the involved systems and components, responsibilities and information flow between the systems.

● At this phase the process is modelled in a more formal way by the usage of an appropriate process modelling language.

● The DO is implemented and tested as defined in the planning and definition phase.

● The policies and roles of the user are created. Also the terminology is mapped to the system. A secondary task is to create a user instance inside the computer system.

Implementation and test

● The modelled policy needs to be converted into an executable form that maps the Technical Services.

● The planned systems and services will be implemented/installed, including all dependant entities.

● The process, policies and its steps are implemented on the different systems inside the digital ecosystem.

● The DO and the properties are implemented and tested.

● The view that the user expects and the policies are implemented and tested.

Enable ● Create the policy’s enacting processes (both computer and human based); check for consistency; and validate/QA them.

● After testing the Technical Services are put into production.

● The process is enabled.

● First version of the DO is

enabled in the system.

● Model of user is enabled.

Monitoring and propose change

● Verify that the effects of the processes enacted by a policy are conform to the policy definition and the requirements. Needs to be checked again if policy has changed.

● Propose changes to the policies if it is becoming

● Technology watch: a significant change (hardware, software, interfaces of other Technical Services, user habits etc.) can lead to the lifecycle event “improve/modify system”. These changes are triggered from the outside.

● All running processes will be monitored, the state of execution and input/output and the steps can be traced. Monitoring cycle helps to find problems and delivers information about how the process needs to be optimized.

● Monitor internal and external change factors impacting Digital Objects.

● Assess scale and impact of detected change. The meaning of the object and/or other dependant objects could totally change. Identify potential actions and associated risks.

● Constantly monitor you designated user community. Watch for any changes in knowledge and terminology. Also keep track of new communities.

● If there is a significant change in knowledge, terminology or a different

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

CONFIDENTIAL © PERICLES Consortium Page 88 / 93

Lifecycles Policy (PO) Technical Service (TS) Process (PS) Digital Object (DO) User (US)

apparent that there is a shift in the digital ecosystem.

● Meta policies help to manage policies. They can intervene on the policy lifecycle (ex. a policy retirement could be triggered by a meta-policy) and define a policy on policy change (ex. require approval by X for change).

user community is added then the system needs to be modified.

Approve and implement change

● After a change is proposed and accepted the change must be approved, modelled and implemented into a policy. The last step is to enable the updated policy.

● After the Technical Services have established, this is the main task of the lifecycle management, constantly improving/adding features to the system/services. An updated component and architecture plan needs to be created. The impact on other systems needs to be analysed.

● Based on the information of the monitoring the process will be modified (optimized) if necessary. It goes through the cycle modelling, implementing, enable, monitoring.

● If there are significant changes model the necessary changes to the DO and to the other entities. Create new revision of the entities, apply and validate the changes.

Re-Use ● Policy is re-used for similar use case

● A Technical Service is re-used for similar tasks.

● Process is re-used and adapted for similar use cases.

● Object is re-used and adapted for similar use cases.

Retire ● A policy is retired and the dependencies are removed from the system. Requires consistency checks and removal of disconnected processes (with human intervention).

● After some time the lifecycle ends and a service/system is retired.

● After some time the process is retired because it is obsolete, a new process replaces a process or new requirements need a new process.

● Object is obsolete and will be retired.

● In this case the designated community vanishes. The question is what happens with the designated user community in the system. Normally it should stay there for investigation in future.

Table 14: Comparison of lifecycles of the five entity types Digital Object, User, Technical Service, Process and Policy

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

CONFIDENTIAL © PERICLES Consortium Page 89 / 93

10. Annex B: Stakeholder roles SCIENCE MEDIA

Digital ecosystem role

Description (primary role)

Name Description Interest / stake Name Description Interest / stake

Data Producer

Creation of data

SOLAR Scientists

SOLAR scientists specify experiments on SOLAR and process and analyse the raw data

1. Access to raw data (from operations centre) 2. Processing of raw data 3. Reuse of archived SOLAR data 4. Preservation of experimental data and metadata

Creator (Digital Arts)

Creator of art work or their representatives (e.g. galleries and technicians or estates).

1. Concerned regarding continued access/availability of their work. 2. Primary source of information regarding how the work was made and what is important to preserve about the work. 3. Require 'authentic' display of their artwork, both in terms of the immediate display environment and changes to technology (e.g., image quality resulting from file types). Authenticity regarding a technology based artwork is determined on a case by case basis depending on what is considered significant regarding that work of art. In some cases the specific technology used might carry important meaning, in other cases the behaviour may be more important to preserve. 4. More generally, the artist works closely with the curator and the conservator to ensure the communication and maintenance of certain conditions/locations for displaying their artwork.

Mission operator

Responsible for operating experiments for scientists on SOLAR.

1. Want to learn from the past (e.g. anomalies) 2. Need to rapidly find and navigate through data.

Media Producer (Digital Media)

Creators and intellectual 'owners' of content.

1. Concerned with the production of high quality footage relating to Tate and its programmes.

Creator (Born-digital)

Creator of the archive material or their representatives (e.g. galleries or estates), they may be the same as the person or institution from whom the

1. Concerned with continued access/availability of the materials. 2. Concerned with the proper management, cataloguing and preservation of the collection.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

CONFIDENTIAL © PERICLES Consortium Page 90 / 93

SCIENCE MEDIA

Digital ecosystem role

Description (primary role)

Name Description Interest / stake Name Description Interest / stake

collection is donated or purchased for Tate's Archive.

Data Consumer

Reuse of data

Scientists (same domain)

Scientists who are peers of the SOLAR scientists, and conduct similar experiments and/or evaluate or critique the way the SOLAR team conduct their experiment and calibration work.

1. Reuse of archived data in their own experiments (merging of similar datasets) 2. Critical analysis of experiments, including processing pipeline, engineering and operations data. 3. Re-running of processing pipelines or enhanced version.

End User (Digital Arts)

End users in Digital Art preservation; may be visitors to the gallery, art professionals using the gallery for research or scholarship as well as those charged with the stewardship of collections.

1. Expect access to artwork. 2. Require authenticity to be ensured for consultation or study.

Scientists (in other domain) -

Scientists that use the SOLAR data for their own research. (This might for example include climate scientists).

1. Make it easier for them to reuse the data.a. Want to find and visualise the datab. Need to be able to understand and interpret the data, in order to reuse it.

End User (Digital Media)

In the media context, End Users (specifically in reference to the PERICLES project) may be external production companies, as well as internal producers within the organisation, as well as the public who might view the productions.

1. Require access to final media productions and media rushes.2. Require access to related (meta-?) information.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

CONFIDENTIAL © PERICLES Consortium Page 91 / 93

SCIENCE MEDIA

Digital ecosystem role

Description (primary role)

Name Description Interest / stake Name Description Interest / stake

Engineers Engineers who work on the payload. (Ground system engineers are of less interest for PERICLES).

1. Need to understand legacy missions. 2. Interested in knowledge management (e.g. capture and reuse of decisions made in the engineering process). 3. Rapid navigation of engineering documents and linked information.

End User (Born-digital)

Academics or other researchers interested in the archive material at Tate (art historians, other academics/scholars, museum curators, picture, researchers, artists, provenance researchers; [c. 80% of Tate Archive’s users are academic researchers, post-graduate and above]).

1. Require access to materials (or copy of original). 2. Require guarantee of (or information on) authenticity (including information on what preservation actions have been performed).

Data Owner

Data and policy ownership

Data owners

Representatives of major funding agencies (e.g. ESA, NASA). Also institutional owners (e.g. Fraunhofer in Freiburg)

1. Promoting preservation and reuse of SOLAR data. 2. Specification of high-level policies and guidelines. 3. Funding of archive implementation and maintenance.

Data Owner (Digital Art, Digital Media, Born-digital)

Senior management within the institution ultimately responsible for data and objects within its control.

1. Ultimately responsible for Tate's high value digital assets (library, archive and art collections). 2. Perspective is likely to be higher-level, less interested in day-to-day issues, and more concerned with long-term issues of access.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

CONFIDENTIAL © PERICLES Consortium Page 92 / 93

SCIENCE MEDIA

Digital ecosystem role

Description (primary role)

Name Description Interest / stake Name Description Interest / stake

Data Manager

Data management and policy implementation

Archive manager

Responsible for the (implementation of) SOLAR archive data policies and processes (does not yet exist)

1 Archivist (Born-digital)

Ensures that materials are preserved and organised. Archivists are responsible for the development of the collection and for its presentation within the archive context. They are responsible for selecting what should and should not be preserved. Archivists may withdraw, redact, or destroy materials in a collection.

1. Manage preservation efforts with the Tate Archive. 2. Catalogue archive collections and related information. 3. Are responsible for ingesting born digital archival objects/information materials into Tate systems. 4. Need to ensure software and media formats are usable by scholars in the future.

Conservators (Digital Arts)

Specialists who are responsible for the continued preservation of artworks.

1. Understand what is important to preserve, design and implement conservation plan for the artwork. 2. Advise to ensure that current digital artworks are accessible in future.

Collection managers, art handlers and registrars (Digital Arts)

Tate staff overseeing processes at management and administrative levels (e.g., registrars for acquisition, and art handlers).

1. Ensure proper procedures are followed regarding the acquisition, transport, handling, storage and tracking of artworks.

Curator (Digital Arts)

Curators are responsible for development and

1. Instigate the acquisition of works into the collection. 2. Are important in relationship management with the artists and galleries.

DELIVERABLE 5.1.1

Initial report on preservation ecosystem management

CONFIDENTIAL © PERICLES Consortium Page 93 / 93

SCIENCE MEDIA

Digital ecosystem role

Description (primary role)

Name Description Interest / stake Name Description Interest / stake

presentation of a collection and are responsible for understanding, maintaining and communicating the art historical context and significance of a work.

3. Understand the implications of any changes on the art historical reading of a work. 4. Are responsible for contextualising the work within the collection and the gallery.

Media Staff (Digital Media)

Digital media staff responsible for locating and editing Tate footage. In addition, they help develop and implement any solutions, policies or systems.

1. Need efficient access to film/media footage. 2. Need suitable IT/IS infrastructure to enable storage of, and work on, film/media footage.

Technical support

Technical Support - IT/IS Staff (Digital Arts, Digital Media, Born-digital)

Tate team overseeing the storage and access of electronic media.

1. Responsible for infrastructure for storing artwork. 2. Responsible for implementation of policies associated with the bit preservation of digital data. 3. Responsible for the development, management and support of the main collection management system and any associated systems.

Table 15: Stakeholder roles from WP2 science and art and media