From MARC-XML to JSON-LD - HESGE MARC-XML to JSON-LD ... Your Data in the LOD Cloud ... I RERO LOD I Elasticsearch https: ...

Download From MARC-XML to JSON-LD - HESGE  MARC-XML to JSON-LD ... Your Data in the LOD Cloud ... I RERO LOD   I Elasticsearch https: ...

Post on 09-Mar-2018

212 views

Category:

Documents

0 download

TRANSCRIPT

  • From MARC-XML to JSON-LDA New Invenio Data Model for Bibliographic Objects andBeyond

    Johnny Marithoz

    HEG-Genve, 2016/08/04

  • RERO DOC

    Rseau des biliothques Suisse occidentale 2 HEG-Genve, 2016/08/04

  • RERO DOC: the RERO Invenio InstanceI RERO digital libraryI started in 2004I 35000 documentsI 215000 print media issuesI 44 institutionsI content: heritage and scholarly documentsI based on Invenio 1.x with patches

    Rseau des biliothques Suisse occidentale 3 HEG-Genve, 2016/08/04

  • RERO CustomizationsI Web designI 1st page design with content informationI purpose of Elasticsearch

    I hierarchical facets navigationI search results highlightingI press pageI multilingual full text search

    I HTML templates introductionI document viewer Multivio - an e-lib.ch project

    (http://www.multivio.org)I visitor statistics pageI 1st JSON-LD/schema.org version

    thanks to the Invenio team

    Rseau des biliothques Suisse occidentale 4 HEG-Genve, 2016/08/04

    http://www.multivio.org

  • RERO DOC before 2013

    Rseau des biliothques Suisse occidentale 5 HEG-Genve, 2016/08/04

  • RERO DOC Home Page

    Rseau des biliothques Suisse occidentale 6 HEG-Genve, 2016/08/04

  • RERO DOC Search Results Page

    Rseau des biliothques Suisse occidentale 7 HEG-Genve, 2016/08/04

  • RERO DOC Digitized Press Page and Multivio

    Rseau des biliothques Suisse occidentale 8 HEG-Genve, 2016/08/04

  • RERO DOC Visitor Statistics Page

    Rseau des biliothques Suisse occidentale 9 HEG-Genve, 2016/08/04

  • RERO DOC: New Challenges

    I software maintenance (over Invenio versions)I new submission interface with data import capabilitiesI new services REST APII Invenio 3I Linked Open Data at RERO (http://data.rero.ch)

    I at the center of our future data modelI RERO DOC as a proof of conceptI focus on internal and external data linking via a large use of

    identifiers (ORCID, etc.)I authority recordsI to be applied also to the Union Catalog

    Invenio 3 with a new data model!

    Rseau des biliothques Suisse occidentale 10 HEG-Genve, 2016/08/04

  • New Software

    New Data Model?

    Rseau des biliothques Suisse occidentale 11 HEG-Genve, 2016/08/04

  • Why Not MARC?

    Rseau des biliothques Suisse occidentale 12 HEG-Genve, 2016/08/04

  • Is MARC Too Old?

    By 1971, MARC formats had become thenational standard for dissemination of bibliographicdata in the United States (Wikipedia) 1972 C programming language is released (http://computerhistory.org)1980 Python project is started (Wikipedia)

    1989 Berners-Lee, Tim. "Information Management: A Proposal" (Wikipedia)

    1990 HTML, URL, HTTP (Wikipedia)

    1994 HTML 1.0 (Wikipedia)

    1994 Netscape 1.0 (Wikipedia)

    1998 Google (Wikipedia)

    2002 Python 2.0 is released (Wikipedia)

    2002 First CDSWare Release (Wikipedia)

    2002 http://json.org started (Wikipedia)

    2005, 2006 JSON is used by Yahoo and Google (Wikipedia)

    2006 First Invenio Release (Wikipedia)

    2014 RFC 7159 became the main reference for JSONs internet uses (Wikipedia)

    Rseau des biliothques Suisse occidentale 13 HEG-Genve, 2016/08/04

  • What Has Changed at the Data Level?

    I THE WEBI data handling has largely improved with new programming

    languagesI Web 2.0 application (client-server, services, etc.) with a lot

    of interactionsI emergence of Linked Open Data: everyone wants to

    connect to your dataI more and more exchange formats, driven by Zotero,

    OAI-PMH, social networks, search engines, etc.I developers spend their time converting the data

    Rseau des biliothques Suisse occidentale 14 HEG-Genve, 2016/08/04

  • MARC Was Designed for the Machinesof the 70s!

    What About Modern Machines?

    Rseau des biliothques Suisse occidentale 15 HEG-Genve, 2016/08/04

  • Object Oriented Data Model

    BookRecord Author

    + id: int+ first name: string+ last name: string

    Person+ id: int+ title: string+ authors: list

    BibRecord

    is a is ahas a

    Rseau des biliothques Suisse occidentale 16 HEG-Genve, 2016/08/04

  • MARC Format

    1234

    From MARC to JSON

    Avram, Henriette

    Crockford, Douglas

    Rseau des biliothques Suisse occidentale 17 HEG-Genve, 2016/08/04

  • Computer Data Structures

    Base Types (value)

    title = "From MARC to JSON"_id = 1234value = 12.3

    Rseau des biliothques Suisse occidentale 18 HEG-Genve, 2016/08/04

  • Computer Data Structures

    List (array)

    authors = ["Henriette Avram","Douglas Crockford"

    ]

    Rseau des biliothques Suisse occidentale 18 HEG-Genve, 2016/08/04

  • Computer Data Structures

    Dictionary (object)

    author = {"lastname": "Avram","firstname": "Henriette"

    }

    Rseau des biliothques Suisse occidentale 18 HEG-Genve, 2016/08/04

  • Computer Data Structures

    All Together

    {"id": 1234,"title": "From MARC to JSON","authors": [

    {"lastname": "Avram","firstname": "Henriette"

    }, {"lastname": "Crockford","firstname": "Douglas"

    }]

    }

    Rseau des biliothques Suisse occidentale 18 HEG-Genve, 2016/08/04

  • Computer Data Structures

    Output Format

    {"id": 1234,"title": "From MARC to JSON","authors": [

    {"lastname": "Avram","firstname": "Henriette"

    }, {"lastname": "Crockford","firstname": "Douglas"

    }]

    }

    Rseau des biliothques Suisse occidentale 18 HEG-Genve, 2016/08/04

  • The JSON Format

    Rseau des biliothques Suisse occidentale 19 HEG-Genve, 2016/08/04

  • Interesting Features

    I simple: value, array, objectI easy to

    I read and writeI share between client and server (python, javascript)I share (REST API)I work with existing libraries (Elasticsearch, Postgresql)

    I can represent any kind of object (comments, notes, tags,libraries, collections, etc.)

    I supported by many programming languagesI human readable (debug, understand)I widely used on the WebI (too?) flexible

    Rseau des biliothques Suisse occidentale 20 HEG-Genve, 2016/08/04

  • Missing Features

    I standard naming (creators, authors, etc.)I data validationI clear format description: human and machine

    JSON Schema

    Rseau des biliothques Suisse occidentale 21 HEG-Genve, 2016/08/04

  • JSON Schema

    Rseau des biliothques Suisse occidentale 22 HEG-Genve, 2016/08/04

  • The Concept

    DataJSON

    SchemaJSON

    + ValidationIngestionQuality Control

    Editor ConfigJSON

    Editor Schema FormJavascript

    + Web Editorwith validation

    Rseau des biliothques Suisse occidentale 23 HEG-Genve, 2016/08/04

  • JSON Schema Advantages

    I describes your existing data formatI clear, human- and machine-readable documentationI complete structural validation, useful for

    I automated testingI validating client-submitted data

    Rseau des biliothques Suisse occidentale 24 HEG-Genve, 2016/08/04

  • Example

    Person Schema{"$schema": "http://json-schema.org/schema#",

    "id":"/schemas/person-v1.0.0.json","title": "Person","description": "A Physical Person","type": "object","properties": {

    "firstName": {"type": "string"},"lastName": {"type": "string"},"age": {

    "description": "Age in years","type": "integer","minimum": 18

    }},"required": ["firstName", "lastName"]}

    Valid Person Data{

    "firstName": "Henriette","lastName": "Avram","age": 55

    }

    Invalid Person Data{

    "lastName": "Avram","age": 10

    }

    Rseau des biliothques Suisse occidentale 25 HEG-Genve, 2016/08/04

  • JSON Editor - Angular Form Editor

    Person Schema{"$schema": "http://json-schema.org/schema#",

    "id":"/schemas/person-v1.0.0.json","title": "Person","description": "A Physical Person","type": "object","properties": {

    "firstName": {"type": "string"},"lastName": {"type": "string"},"age": {

    "description": "Age in years","type": "integer","minimum": 18

    }},"required": ["firstName", "lastName"]}

    Editor

    Editor Configuration[{

    "key": "firstName","placeholder": "please enter..."

    }, {"key": "lastName","placeholder": "please enter..."

    },"age",{

    "key": "comment","type": "textarea","placeholder": "Make a comment"

    }, {"type": "submit","style": "btn-info","title": "Submit"

    }]

    Form Data

    Rseau des biliothques Suisse occidentale 26 HEG-Genve, 2016/08/04

    http://schemaform.io/examples/bootstrap-example.html#/44e5c966452dac5f5e19http://schemaform.io/examples/bootstrap-example.html#/64552997a5e41ac53322http://schemaform.io/examples/bootstrap-example.html#/5a32e594e552fa45ec14

  • Exporting your Data

    JSON-LD

    Rseau des biliothques Suisse occidentale 27 HEG-Genve, 2016/08/04

  • The Concept

    JSONLocal

    @contextMapping +

    JSON-LDRDF

    RDFa RDF/XML N3 Turtle

    Rseau des biliothques Suisse occidentale 28 HEG-Genve, 2016/08/04

  • Your Data in the LOD Cloud

    InvenioInstance

    JSON-L

    D

    Rseau des biliothques Suisse occidentale 29 HEG-Genve, 2016/08/04

  • JSON Editor

    Book{"recid": "1234",

    "title": "From Marc to JSON",

    "authors": [{"name": "Crockford, Douglas 1955-"

    },{"uri": "http://viaf.org/viaf/18236820"

    }]}

    @context"@context": {

    "dc": "http://purl.org/dc/elements/1.1/","dct": "http://purl.org/dc/terms/",

    "@base": "http://doc.rero.ch/record/",

    "recid": "@id","uri": "@id","name": "@value",

    "title": "dct:title","authors": "dc:creator"

    }

    JSON-LD

    Rseau des biliothques Suisse occidentale 30 HEG-Genve, 2016/08/04

    http://json-ld.org/playground/#startTab=tab-compacted&json-ld=%7B%22%40context%22%3A%7B%22dc%22%3A%22http%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%22%2C%22dct%22%3A%22http%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%22%2C%22%40base%22%3A%22http%3A%2F%2Fdoc.rero.ch%2Frecord%2F%22%2C%22recid%22%3A%22%40id%22%2C%22uri%22%3A%22%40id%22%2C%22name%22%3A%22%40value%22%2C%22title%22%3A%22dct%3Atitle%22%2C%22authors%22%3A%22dc%3Acreator%22%7D%2C%22recid%22%3A%221234%22%2C%22title%22%3A%22From%20Marc%20to%20Json%22%2C%22authors%22%3A%5B%7B%22name%22%3A%22Crockford%2C%20Douglas%201955-%22%7D%2C%7B%22uri%22%3A%22http%3A%2F%2Fviaf.org%2Fviaf%2F18236820%22%7D%5D%7D&context=%7B%22%40context%22%3A%7B%22dc%22%3A%22http%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%22%2C%22dct%22%3A%22http%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%22%2C%22%40base%22%3A%22http%3A%2F%2Fdoc.rero.ch%2Frecord%2F%22%2C%22recid%22%3A%22%40id%22%2C%22uri%22%3A%22%40id%22%2C%22name%22%3A%22%40value%22%2C%22title%22%3A%22dct%3Atitle%22%2C%22authors%22%3A%22dc%3Acreator%22%7D%7D

  • SummaryI JSON Data

    I simpleI powerfulI portable

    I JSON-Schema FramworkI validationI HTML form generation

    I JSON-LD MappingI lightweight data exchange

    And MARC? full forward/backward compatibility

    MARC JSON

    Rseau des biliothques Suisse occidentale 31 HEG-Genve, 2016/08/04

  • A New Data Model Based on JSON

    Rseau des biliothques Suisse occidentale 32 HEG-Genve, 2016/08/04

  • The Current Model

    Core LibraryInternal Representation

    JSON-LD schema.org (Google)RERO-LD

    HTML/XMLFrontendScholarOpenGraphunAPI (zotero)FacebookTwiterOAI-PMH server

    ASCIIBibTex

    REST APIJSON

    Indexer

    Storage

    Submission Interface

    External SourceMARC21 via z3950XMLMARC via OAI-PMH

    MARCPython CodePython / XSLT

    ComplexLibrary

    Rseau des biliothques Suisse occidentale 33 HEG-Genve, 2016/08/04

  • The New Data Model

    Core LibraryInternal Representation

    JSON-LD schema.org (Google)RERO-LD

    HTML/XMLFrontendScholarOpenGraphunAPI (zotero)FacebookTwiterOAI-PMH server

    ASCIIBibTex

    REST APIJSON

    Indexer

    Storage

    Submission Interface

    External SourceMARC21 via z3950XMLMARC via OAI-PMH

    MARCJsonPython CodeNoneHTML Template

    @context

    schemaform

    JSON

    easy

    dojson

    Rseau des biliothques Suisse occidentale 34 HEG-Genve, 2016/08/04

  • Conclusion

    Rseau des biliothques Suisse occidentale 35 HEG-Genve, 2016/08/04

  • ConclusionI Invenio 3 opens new perspectivesI JSON is obvious for the WebI still MARC compatibleI data conversion is more affordable, robust and easier to

    maintainI developers may focus on new developmentsI librarians may take full control of data modeling and

    exchange by learning JSON-Schema and JSON-LD

    Rseau des biliothques Suisse occidentale 36 HEG-Genve, 2016/08/04

  • ReferencesI RERO DOC http://doc.rero.chI Invenio http://invenio-software.org/I JSON-LD http://json-ld.org/I JSON Schema http://json-schema.org/I RERO LOD http://data.rero.chI Elasticsearch https://www.elastic.coI Angular Form Editor http://schemaform.io/

    Rseau des biliothques Suisse occidentale 37 HEG-Genve, 2016/08/04

    http://doc.rero.chhttp://invenio-software.org/http://json-ld.org/http://json-schema.org/http://data.rero.chhttps://www.elastic.cohttp://schemaform.io/

    IntroductionRERO DOC

    The Bibliographic Data ModelComputer Data Structures

    The JSON FormatJSON SchemaJSON-LDSummary

    A New Data Model Based on JSONConclusionReferences