european digital repositories: an overview elag 2006, bucharest 26.-28.4.2006 juha hakala helsinki...

19
European digital European digital repositories: an repositories: an overview overview ELAG 2006, Bucharest 26.- ELAG 2006, Bucharest 26.- 28.4.2006 28.4.2006 Juha Hakala Juha Hakala Helsinki University Library Helsinki University Library

Upload: dortha-scott

Post on 27-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

European digital European digital repositories: an overviewrepositories: an overview

ELAG 2006, Bucharest 26.-28.4.2006ELAG 2006, Bucharest 26.-28.4.2006

Juha HakalaJuha Hakala

Helsinki University LibraryHelsinki University Library

Page 2: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

ContentsContents

The problemThe problem The solution: legal, organisational and The solution: legal, organisational and

technical aspectstechnical aspects Standards – metadata and other stuffStandards – metadata and other stuff Open source applicationsOpen source applications

Web archives (IIPC)Web archives (IIPC) Institutional repositoriesInstitutional repositories

Commercial applicationsCommercial applications

Page 3: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

The problemThe problem

”…”…digital preservation is going to be an digital preservation is going to be an enormous issue – a very fundamental enormous issue – a very fundamental problem at all levels from the nation-state problem at all levels from the nation-state to the individual. In my view, it’s going to to the individual. In my view, it’s going to attract increasing commercial interest, as attract increasing commercial interest, as well as growing unease and concern from well as growing unease and concern from the general public, over the next decade.”the general public, over the next decade.”

Clifford Lynch, Where do we go from here? The next decade for Clifford Lynch, Where do we go from here? The next decade for

digital libraries. D-Lib Magazine, July/August 2005digital libraries. D-Lib Magazine, July/August 2005

Page 4: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

The problem (2)The problem (2)

Digital preservation is not only a technical, but Digital preservation is not only a technical, but also legal and organisational problemalso legal and organisational problem There has to be legal / contractual basis for There has to be legal / contractual basis for

preserving e.g. publications, governmental preserving e.g. publications, governmental publications and research datapublications and research data

These laws must assign responsibility to a limited set These laws must assign responsibility to a limited set of organisations, which must have sufficient human of organisations, which must have sufficient human and other resources for taking care of the taskand other resources for taking care of the task

How to decide what deserves to be preserved, How to decide what deserves to be preserved, because not everything can be kept?because not everything can be kept?

Page 5: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

The solution: legal aspectsThe solution: legal aspects In Europe, many countries have revised their In Europe, many countries have revised their

legal deposit acts so that they cover broad array legal deposit acts so that they cover broad array of digital assetsof digital assets This trend started from Norway (I think) in early 90’sThis trend started from Norway (I think) in early 90’s

New legal deposit acts share features, due to New legal deposit acts share features, due to co-operation between the law makersco-operation between the law makers E.g. harvesting the national Web spaceE.g. harvesting the national Web space

Some aspects of preservation relate to copyright Some aspects of preservation relate to copyright act; it has to be revised as wellact; it has to be revised as well A digital archive must be able to copy & migrate the A digital archive must be able to copy & migrate the

documents (and remove copy protection if needed)documents (and remove copy protection if needed) EU Copyright directive makes this possibleEU Copyright directive makes this possible

Page 6: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

The solution: organisational The solution: organisational aspectsaspects

Digital preservation requires co-operation across Digital preservation requires co-operation across traditional organisational borderstraditional organisational borders Libraries, archives, and museums must join forces Libraries, archives, and museums must join forces

with other relevant players in the public sector, with other relevant players in the public sector, publishers, book sellers, IT business, etc.: all the publishers, book sellers, IT business, etc.: all the money and support we can get will be neededmoney and support we can get will be needed

In the national level, there nevertheless should In the national level, there nevertheless should be one organisation to co-ordinate the effortbe one organisation to co-ordinate the effort

International co-operation has already shown its International co-operation has already shown its usefulness, but we need more of itusefulness, but we need more of it International Internet Preservation Consortium, IIPC International Internet Preservation Consortium, IIPC

Page 7: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

The solution: technical aspectsThe solution: technical aspects

Digital archive must guarantee continuous Digital archive must guarantee continuous access to and usage of the archived assets. access to and usage of the archived assets. Therefore, it must: Therefore, it must: Keep the bits, andKeep the bits, and Migrate the assets to new SW platforms, and/orMigrate the assets to new SW platforms, and/or Emulate the original HW/SW environment in the new Emulate the original HW/SW environment in the new

technical environmenttechnical environment

Migration and emulation must both be used; the Migration and emulation must both be used; the best choice depends on the assetbest choice depends on the asset There is no agreement on the best overall strategyThere is no agreement on the best overall strategy

Page 8: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

The solution: technical aspects (2) The solution: technical aspects (2)

We do not know how often we need to migrate We do not know how often we need to migrate assets and how hard that is going to be, or how assets and how hard that is going to be, or how often we must build a new emulator and how often we must build a new emulator and how complicated that will becomplicated that will be It is impossible to make an estimate of the cost of It is impossible to make an estimate of the cost of

digital archiving digital archiving

Digital archaeology: develop means of helping Digital archaeology: develop means of helping the users to access ”out-of-date” resourcesthe users to access ”out-of-date” resources An expensive means of accessing data which has not An expensive means of accessing data which has not

been preserved properlybeen preserved properly

Page 9: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

The solution: technical aspects (3)The solution: technical aspects (3)

There isn’t much serious research on emulation There isn’t much serious research on emulation & migration; among the national libraries, & migration; among the national libraries, Koninklijke Bibliotheek is probably unique on its Koninklijke Bibliotheek is probably unique on its investment on thisinvestment on this

Luckily the results from the KB will most likely be Luckily the results from the KB will most likely be universally applicableuniversally applicable Plenty of room for European / global co-operationPlenty of room for European / global co-operation

Proving that an archive works for 1000 years will Proving that an archive works for 1000 years will be difficult; much harder than proving with 100 % be difficult; much harder than proving with 100 % certainty that it does not…certainty that it does not…

Page 10: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

Standards: metadata and other Standards: metadata and other stuffstuff

Preservation method as such is only a backbone of an operational Preservation method as such is only a backbone of an operational system, other things are needed, such as: system, other things are needed, such as:

Overall architecture of the digital archiveOverall architecture of the digital archive Preservation metadataPreservation metadata

Open Archival Information System (OAIS) gives us a good starting Open Archival Information System (OAIS) gives us a good starting point for the former, although it has been extended in various ways point for the former, although it has been extended in various ways in real life digital archiving projects like NEDLIBin real life digital archiving projects like NEDLIB

Interestingly, we lack a similar standard for digital asset management Interestingly, we lack a similar standard for digital asset management systems (and an agreement on who should develop it)systems (and an agreement on who should develop it)

We still lack proper understanding of preservation metadataWe still lack proper understanding of preservation metadata ISBD(ER) is definitely not sufficient from preservation point of viewISBD(ER) is definitely not sufficient from preservation point of view

New identifiers and identifier systems capable of covering large New identifiers and identifier systems capable of covering large extent of digital assets must be designed as well. extent of digital assets must be designed as well.

Page 11: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

ApplicationsApplications

There is no – and will not be – one digital There is no – and will not be – one digital archive to fit all purposes; instead, there will be archive to fit all purposes; instead, there will be domain / content specific modules, coupled with domain / content specific modules, coupled with generic preservation toolsgeneric preservation tools

There will be both open source and commercial There will be both open source and commercial applications, which can be utilised side by side applications, which can be utilised side by side in a broader digital library environmentin a broader digital library environment

Popularity of Open source may slow down Popularity of Open source may slow down development of commercial toolsdevelopment of commercial tools Institutional repositoriesInstitutional repositories

Page 12: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

Web archivingWeb archiving

Initiated by the Royal Library of Sweden in Mid-Initiated by the Royal Library of Sweden in Mid-90’s; work continued in the NEDLIB project and 90’s; work continued in the NEDLIB project and finally in IIPC (International Internet Preservation finally in IIPC (International Internet Preservation Consortium)Consortium)

Within a decade, both legal and technical Within a decade, both legal and technical aspects of Web archiving have been solved, at aspects of Web archiving have been solved, at least for the time beingleast for the time being

Many legal deposit acts incorporate Web Many legal deposit acts incorporate Web harvesting; it is seen as the only feasible way of harvesting; it is seen as the only feasible way of extending legal deposit to the Webextending legal deposit to the Web

Page 13: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

Web archiving (2)Web archiving (2)

The market for Web archiving applications is The market for Web archiving applications is very small; therefore some (primarily) European very small; therefore some (primarily) European national libraries formed IIPC with the Internet national libraries formed IIPC with the Internet ArchiveArchive

IIPC has built a Web harvester (Heritrix) and IIPC has built a Web harvester (Heritrix) and special tools for indexing and accessing the special tools for indexing and accessing the harvested resourcesharvested resources

All tools are available for free, and they are still All tools are available for free, and they are still being developed by the growing consortiumbeing developed by the growing consortium

Page 14: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

Web archiving: practiceWeb archiving: practice

The Internet Archive has harvested 60 billion The Internet Archive has harvested 60 billion Web pages globally since ~1996Web pages globally since ~1996

Numerous countries have harvested their Numerous countries have harvested their domestic Web spaces, either selectively or as domestic Web spaces, either selectively or as much of it as possible; there is increasing much of it as possible; there is increasing awareness that this makes senseawareness that this makes sense Europe: about 30-40 % of countries doing something?Europe: about 30-40 % of countries doing something?

Sustainability of the archives is not fully proven Sustainability of the archives is not fully proven yet; exotic resources and the sheer size of the yet; exotic resources and the sheer size of the archive may become a problem eventuallyarchive may become a problem eventually

Page 15: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

Institutional repositoriesInstitutional repositories

Not really digital archives, but are used as ”short Not really digital archives, but are used as ”short term substitutes” of themterm substitutes” of them

There are a few open source software packages There are a few open source software packages available (such as Fedora, EPrints and DSpace), available (such as Fedora, EPrints and DSpace), and under the recent years they have developed and under the recent years they have developed faster than at least some of their (more generic) faster than at least some of their (more generic) commercial counterpartscommercial counterparts

We do not know yet how archiving-related We do not know yet how archiving-related software modules such as migration tools can be software modules such as migration tools can be linked to the repository toolslinked to the repository tools

Page 16: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

Commercial applicationsCommercial applications

Two interrelated problems:Two interrelated problems: Target user groupTarget user group Functionality and links to other (library) applicationsFunctionality and links to other (library) applications

Digital archives will not be built (only) for Digital archives will not be built (only) for libraries, but (national) libraries will probably be libraries, but (national) libraries will probably be among key customersamong key customers Will our needs be similar enough to those of e.g. Will our needs be similar enough to those of e.g.

national archives or pharmaceutical companies?national archives or pharmaceutical companies?

It is expensive to build a digital archive SWIt is expensive to build a digital archive SW Will library system vendors be able to make it?Will library system vendors be able to make it?

Page 17: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

Commercial applications: Commercial applications: functionalityfunctionality

Should two national libraries write an RFP Should two national libraries write an RFP for a digital archive application, there for a digital archive application, there would be plenty of disagreement in detailswould be plenty of disagreement in details

Vendors will have problems understanding Vendors will have problems understanding what the libraries and other customers what the libraries and other customers want; the situation will only get better want; the situation will only get better when our understanding of what digital when our understanding of what digital archiving is, gets betterarchiving is, gets better

Page 18: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

Commercial applications: presentCommercial applications: present

IBM DIASIBM DIAS The only digital archive application available nowThe only digital archive application available now Architecture based on the OAIS modelArchitecture based on the OAIS model Functionality tuned to the requirements of the Functionality tuned to the requirements of the

Koninklijke Bibliotheek; their system will contain more Koninklijke Bibliotheek; their system will contain more than 9 million scientific articles by the end of 2006than 9 million scientific articles by the end of 2006

The only (?) other customer for now is DDB; more are The only (?) other customer for now is DDB; more are needed to guarantee DIAS survival over long termneeded to guarantee DIAS survival over long term

Others?Others? None, yet, but it is likely that there will be at least a None, yet, but it is likely that there will be at least a

few more (just like Cliff Lynch predicted)few more (just like Cliff Lynch predicted) What kind of content will they deal with, and how?What kind of content will they deal with, and how?

Page 19: European digital repositories: an overview ELAG 2006, Bucharest 26.-28.4.2006 Juha Hakala Helsinki University Library

ConclusionConclusion

The train is moving, but we are not far from the The train is moving, but we are not far from the station we took offstation we took off

The key problem will be (lack of) funding:The key problem will be (lack of) funding: National libraries have problems with the cost of National libraries have problems with the cost of

storing even printed materials (or at least my library storing even printed materials (or at least my library has); the cost of storing digital assets will come on top has); the cost of storing digital assets will come on top of the cost of traditional storageof the cost of traditional storage

How do we prove that digital archiving is so important How do we prove that digital archiving is so important that the additional funding is justified? that the additional funding is justified?

How much can we cut the costs via further IIPC-like How much can we cut the costs via further IIPC-like cooperation in developing applications and best cooperation in developing applications and best practices?practices?