regal - a repository for electronic documents and bibliographic data

43
graphthinking a Repository for Electronic Documents and Bibliographic Data Felix Ostrowski (graphthinking, @literarymachine) Jan Schnasse (hbz, @InspektorHicks) ELAG, June 11th 2014, University of Bath

Upload: felix-ostrowski

Post on 11-May-2015

378 views

Category:

Internet


1 download

TRANSCRIPT

Page 1: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

a Repository for Electronic Documents and Bibliographic Data

Felix Ostrowski (graphthinking, @literarymachine)Jan Schnasse (hbz, @InspektorHicks)

ELAG, June 11th 2014, University of Bath

Page 2: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Rationale: A new foundation for Edoweb

● A system to gather, describe and archive deposit copies of electronic publications and websites on behalf of the State Library Center of Rhineland-Palatinate (LBZ)

● Operated by the North Rhine-Westphalian Library Service Center (hbz) since 2002

● Technical evolution: OPUS – Digitool – regal

Page 3: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

The current system and its shortcomings: Digitool

● Digitool end-of-life is coming● Unwanted/unexpected dependencies to other projects

hosted on the same Digitool instance● Performance issues (we have millions of objects in

Digitool)● No easily configurable search indexes or OAI-PMH

interfaces for single collections● No out-of-the-box support of regional requirements (e.g.

metadata delivery to German National Library), extra money/developer hours needed

Page 4: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

The current system and its shortcomings: Homemade

● Mix of self developed and Ex Libris components● Vicious circle

– introduction of workarounds– unpredictable migration costs

– decision to stay on obsolete version

– running out of support– introduction of workarounds

● Administrative responsibilities in different hbz working groups

Page 5: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Altogether, this leads to a expensive, hard to maintain and outdated system that doesn't

satisfy our and ours clients needs.

Page 6: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

The following aspects are mandatory to achieve our goals

● Increase the overall performance● Provide an up-to-date, modern user interface● Use open source software (Fedora, Elasticsearch, Drupal)● Seamlessly import (meta-)data from Digitool and potentially other

(repository) systems● Integrate the system with the emerging Linked-Open-Data

ecosystem, especially authority data● Loosen the tight integration with Ex Libris Aleph● Expose (meta-)data for easy discovery & re-use by others.

Page 7: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Overview of the new architecture

regal (backend)

Fedora Elasticsearch

regal-drupal (frontend)

Ex LibrisAleph

lobid API

Page 8: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Data model

● Simple hierarchical data model consists of nodes associated via hasPart and partOf relations

● Each node is identified by a namespace combined with a Universally Unique Identifier (UUID)

● Each node can have a bit and a metadata stream

● Metadata canonically stored as RDF N-triples● Bitstream can contain arbitrary data

Page 9: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Page 10: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Fedora (3.7.1)

● mainly used to organize and associate multiple datastreams and their versions

● provides a long term accessible data storage ● usage of Proai as OAI-PMH solution

Page 11: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Elasticsearch (1.1.0)

● Used to provide performant lookup (for metadata and full-text)

● Stores compacted JSON-LD● Faceting can be used to browse the collection

Page 12: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Backend / API

● Java Web API (RESTful) implemented with Jersey

● Abstracts access to storage & indexing, transparently updates Fedora and different Elasticsearch indexes

● Provides resources as OAI-ORE aggregations

Page 13: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Drupal Frontend

● Re-use of common features– User management

– Template-system

– Field API

– RDF Mappings

– HTML-Form API

● Extended with custom modules for– Storage Backend

– Linked Data Fields

– JavaScript UI enhancements

Page 14: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

No big surprises for plaintext input...

Page 15: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Catalinking

Page 16: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Simple lookup widget withconfigurable data sources(currently only lobid-API

is implemented)

Page 17: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Page 18: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Additional linked data isintegrated on-the-fly

Page 19: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Page 20: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Client-side sorting (andsoon also searching) of

linked data

Page 21: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Exposing data

Page 22: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Page 23: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Page 24: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Importing data

Page 25: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

This is simply a shortcut,any linked data URI can

be used.

Page 26: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Tada!

Page 27: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Page 28: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Managing structure

Page 29: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Possible child nodes, in caseof a monograph these are

only files. Journals provide morecomplex structures (volumes,

issues, articles).

Page 30: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Page 31: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Basic technical metadataadded by the backend.

Page 32: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Move object by settingsits new parent.

Page 33: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Faceted search, brought to us by Elasticsearch

Page 34: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinkingFacets can be added and removed individually.

Page 35: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Page 36: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Anybody can say anything about anything...

Page 37: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Local views on remote resources,e.g. authors and classifications.

Page 38: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Obstacles encountered / lessons learned: Drupal

● is designed to be standalone, so we basically have two backends

● its HTML Form API can be awkward to work with if you don't want to do things the "Drupal-way"

● a pure JavaScript / HTML5 frontend might replace Drupal in upcoming versions

Page 39: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Obstacles encountered / lessons learned: Fedora

● is more of an infrastructure than a storage system

● because of its complexity, we consider authorization via XACML a big disadvantage

● OAI-PMH is also not supported very well● we are still looking for a more lightweight

solution● perhaps as lightweight as simply using the file

system for both bitstreams and metadata

Page 40: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Obstacles encountered / lessons learned: Elasticsearch

● Works very well with JSON-LD in general● but needs some care to create proper

mappings● and could use a more generic notion of

relations than only parent/child.

Page 41: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Further regal applications

● Migrate further Digitool and non-Digitool repositories

● Frontend: Prototype of an OER World Map

Page 42: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Good news: Linked Data Works!

● regal / Edoweb is not a research project,● it is integrated into the hbz IT landscape,● it is on the web,● it does not require expertise in Linked Data,● and real librarians will use it to create real

catalog entries.

Page 43: Regal - a Repository for Electronic Documents and Bibliographic Data

graphthinking

Thank you!

Questions? Now or later to

[email protected]

[email protected]