regal - a repository for electronic documents and bibliographic data

Post on 11-May-2015

378 Views

Category:

Internet

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

graphthinking

a Repository for Electronic Documents and Bibliographic Data

Felix Ostrowski (graphthinking, @literarymachine)Jan Schnasse (hbz, @InspektorHicks)

ELAG, June 11th 2014, University of Bath

graphthinking

Rationale: A new foundation for Edoweb

● A system to gather, describe and archive deposit copies of electronic publications and websites on behalf of the State Library Center of Rhineland-Palatinate (LBZ)

● Operated by the North Rhine-Westphalian Library Service Center (hbz) since 2002

● Technical evolution: OPUS – Digitool – regal

graphthinking

The current system and its shortcomings: Digitool

● Digitool end-of-life is coming● Unwanted/unexpected dependencies to other projects

hosted on the same Digitool instance● Performance issues (we have millions of objects in

Digitool)● No easily configurable search indexes or OAI-PMH

interfaces for single collections● No out-of-the-box support of regional requirements (e.g.

metadata delivery to German National Library), extra money/developer hours needed

graphthinking

The current system and its shortcomings: Homemade

● Mix of self developed and Ex Libris components● Vicious circle

– introduction of workarounds– unpredictable migration costs

– decision to stay on obsolete version

– running out of support– introduction of workarounds

● Administrative responsibilities in different hbz working groups

graphthinking

Altogether, this leads to a expensive, hard to maintain and outdated system that doesn't

satisfy our and ours clients needs.

graphthinking

The following aspects are mandatory to achieve our goals

● Increase the overall performance● Provide an up-to-date, modern user interface● Use open source software (Fedora, Elasticsearch, Drupal)● Seamlessly import (meta-)data from Digitool and potentially other

(repository) systems● Integrate the system with the emerging Linked-Open-Data

ecosystem, especially authority data● Loosen the tight integration with Ex Libris Aleph● Expose (meta-)data for easy discovery & re-use by others.

graphthinking

Overview of the new architecture

regal (backend)

Fedora Elasticsearch

regal-drupal (frontend)

Ex LibrisAleph

lobid API

graphthinking

Data model

● Simple hierarchical data model consists of nodes associated via hasPart and partOf relations

● Each node is identified by a namespace combined with a Universally Unique Identifier (UUID)

● Each node can have a bit and a metadata stream

● Metadata canonically stored as RDF N-triples● Bitstream can contain arbitrary data

graphthinking

graphthinking

Fedora (3.7.1)

● mainly used to organize and associate multiple datastreams and their versions

● provides a long term accessible data storage ● usage of Proai as OAI-PMH solution

graphthinking

Elasticsearch (1.1.0)

● Used to provide performant lookup (for metadata and full-text)

● Stores compacted JSON-LD● Faceting can be used to browse the collection

graphthinking

Backend / API

● Java Web API (RESTful) implemented with Jersey

● Abstracts access to storage & indexing, transparently updates Fedora and different Elasticsearch indexes

● Provides resources as OAI-ORE aggregations

graphthinking

Drupal Frontend

● Re-use of common features– User management

– Template-system

– Field API

– RDF Mappings

– HTML-Form API

● Extended with custom modules for– Storage Backend

– Linked Data Fields

– JavaScript UI enhancements

graphthinking

No big surprises for plaintext input...

graphthinking

Catalinking

graphthinking

Simple lookup widget withconfigurable data sources(currently only lobid-API

is implemented)

graphthinking

graphthinking

Additional linked data isintegrated on-the-fly

graphthinking

graphthinking

Client-side sorting (andsoon also searching) of

linked data

graphthinking

Exposing data

graphthinking

graphthinking

graphthinking

Importing data

graphthinking

This is simply a shortcut,any linked data URI can

be used.

graphthinking

Tada!

graphthinking

graphthinking

Managing structure

graphthinking

Possible child nodes, in caseof a monograph these are

only files. Journals provide morecomplex structures (volumes,

issues, articles).

graphthinking

graphthinking

Basic technical metadataadded by the backend.

graphthinking

Move object by settingsits new parent.

graphthinking

Faceted search, brought to us by Elasticsearch

graphthinkingFacets can be added and removed individually.

graphthinking

graphthinking

Anybody can say anything about anything...

graphthinking

Local views on remote resources,e.g. authors and classifications.

graphthinking

Obstacles encountered / lessons learned: Drupal

● is designed to be standalone, so we basically have two backends

● its HTML Form API can be awkward to work with if you don't want to do things the "Drupal-way"

● a pure JavaScript / HTML5 frontend might replace Drupal in upcoming versions

graphthinking

Obstacles encountered / lessons learned: Fedora

● is more of an infrastructure than a storage system

● because of its complexity, we consider authorization via XACML a big disadvantage

● OAI-PMH is also not supported very well● we are still looking for a more lightweight

solution● perhaps as lightweight as simply using the file

system for both bitstreams and metadata

graphthinking

Obstacles encountered / lessons learned: Elasticsearch

● Works very well with JSON-LD in general● but needs some care to create proper

mappings● and could use a more generic notion of

relations than only parent/child.

graphthinking

Further regal applications

● Migrate further Digitool and non-Digitool repositories

● Frontend: Prototype of an OER World Map

graphthinking

Good news: Linked Data Works!

● regal / Edoweb is not a research project,● it is integrated into the hbz IT landscape,● it is on the web,● it does not require expertise in Linked Data,● and real librarians will use it to create real

catalog entries.

graphthinking

Thank you!

Questions? Now or later to

felix.ostrowski@gmail.com

schnasse@hbz-nrw.de

top related