introduction to digital library technology - the invenio ...ais-grid-2011.jinr.ru/docs/j-y. le...

Post on 31-Jul-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Introduction to Digital Library TechnologyThe INVENIO software

J-Y. Le Meur

Department of Information TechnologyCERN

24-10-2011 /JINR-CERN School on GRID andInformation Management Systems

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

What is a Digital Library ?

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

What is a Digital Library ?

A library in which collections are stored in digitalformats (as opposed to print, microform, or othermedia) and accessible by computers. (...) A digitallibrary is a type of information retrieval system.A virtual organisation, that comprehensively collects,manages and preserves for the long time rich digitalcontent, and offers to its targeted user communitiesspecialised functionality on that content.(1) institutional document repositories(2) world-wide subject-based information systems

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

What is a Digital Library ?

A library in which collections are stored in digitalformats (as opposed to print, microform, or othermedia) and accessible by computers. (...) A digitallibrary is a type of information retrieval system.A virtual organisation, that comprehensively collects,manages and preserves for the long time rich digitalcontent, and offers to its targeted user communitiesspecialised functionality on that content.(1) institutional document repositories(2) world-wide subject-based information systems

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Some key notions

Repository type: institutional versus disciplinaryHybrid libraries: electronic resources versus traditionalprint materialContent type: born digital versus converted contentArchive concept: traditional Archive versus digitalArchiveLibrary type: digital versus virtualOpen access: Green versus Gold

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

What is a Digital Library ?Ex 1: CERN Document Server

Example 1: CERN Document Server

managing CERN and selected non-CERN high-energyphysics and related documents since 1993more than 1,000,000 recordsarticles, books, theses, photos, videos, and morepowered by Invenio, free digital library softwarehttp://cdsweb.cern.ch/

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS: Collection tree

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS: Search for Books

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS: Search for photos

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS Feature: Commenting

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS Feature: Create Personnal Alert

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS Feature: Add to Basket

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS Feature: Display Personnal Basket

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS Feature: Organise and Share Baskets

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS: Journals and Bulletins

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

What is a Digital Library ?Ex 2: INSPIRE

Example 2: INSPIRE

world-wide high-energy physics information systemrun by CERN, DESY, FNAL, SLACmetadata curation since 1960s, Invenio technologysince 2007citation analysis, author/affiliation analysisclose partnership with arXiv and ADShttp://inspirehep.org/

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

INSPIRE: full-text search

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

INSPIRE: Cite Summary

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

INSPIRE: Citation History

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

INSPIRE: Author pages

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Whats is a Digital Library ?Ex3: The JDS Digital Library: jdsweb.jinr.ru

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Other Famous examples ?

Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Other Famous examples ?

Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Other Famous examples ?

Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Other Famous examples ?

Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Other Famous examples ?

Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Library Standardsexchange, identifiers and preservation

Exchange protocols: Z39.50 and OAI-PMHbetween Data and Service providers

Interoperability: SWORD = Simple Web-serviceOffering Repository DepositIdentifiers: ISBN and DOIPreservation: METS, PDF/A, OAIS

Content description: Metadata Encoding andTransmission StandardData formatsSupporting system: Open Archival Information Systemref. model

Content representation: MARC, DCXML-MARC

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Library StandardsContent representation

Metadata: data about dataMetadata types: descriptive, structural andadministrativeMetadata schema: set of defined elements (e.g.MARC, DC)MARC: MAchine Readable Cataloguing, internationalstandard for representing and communicatingbibliographic records, developed in the 60s, cataloguecard oriented, high degree of complexity to cover allpurposeXML-MARC: XML schema based on MARC21developed by Library of Congress

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

MARC and XML-MARC examples

Tags, identifiers and subcodes001__ 1337270

037__ $$aCERN-PH-EP-2011-030

100__ $$aClerbaux, Barbara $$eed. $$iINSPIRE-00314890 $$uBrussels U.

245__ $$aSearch for New Physics in Dijet

260__ $$c2011

520__ $$aA search for new interactions and resonances [..]

XML-MARC: tag 100<datafield tag="100" ind1=" " ind2=" ">

<subfield code="a">Clerbaux, Barbara</subfield>

<subfield code=“e">ed.</subfield>

<subfield code=“i”>INSPIRE-00314890</subfield>

<subfield code="u">Brussels U.</subfield>

</datafield>

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Digital LibrarySummary

Definitions of a digital libraryThe variety of types and concepts behind "DigitalLibrary"Examples of institutional and subject-basedrepositoriesSome functionnalities of Digital LibraiesSome important standards: MARC, SWORD, OAI-PMHNext: the need for specialized software

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Digital LibrarySummary

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why specialized Software ?

Specialist software for building, maintaining, managing orrunning digital libraries.Institutional repository software focuses primarily on ingest,preservation and access of locally produced documents,particularly locally produced academic outputs.

Content is organized and ready for exchange (supportof interoperability protocols)Metadata and Data is preserved for long term (supportof preservation standards)Submission, Edition, Curation processes are supportedDissemination is organized and controlledSW examples: Eprints, DSpace, Fedora, Greenstone...Invenio

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why specialized Software ?

Specialist software for building, maintaining, managing orrunning digital libraries.Institutional repository software focuses primarily on ingest,preservation and access of locally produced documents,particularly locally produced academic outputs.

Content is organized and ready for exchange (supportof interoperability protocols)Metadata and Data is preserved for long term (supportof preservation standards)Submission, Edition, Curation processes are supportedDissemination is organized and controlledSW examples: Eprints, DSpace, Fedora, Greenstone...Invenio

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why at CERN ?an interesting challenge

A physicist office

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Invenio History

1954: CERN laboratory is created1989: Tim Berners-Lee invents the Web1991: SPIRES (SLAC) is the first database on the WebArXiv, the archive of Physics papers, moves to the Web1993: CERN Preprint Server starts as an institutionaland disciplinary repository1996: CERN Library Server includes Books andPeriodicals, as an hybrid library2000: CERN Document Server includes Multimediamaterial and restricted notes2002: CDSWare SW released open source2006: CDSWare becomes Invenio; start of I18Ncollaborations2010: Invenio 1.0 released and adopted world-wide

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Key featuresinvenio-software.org

navigable collection tree (regular, virtual, hosted)powerful search engine

Google-like speed for up to 5M recordscombined metadata, reference and fulltext search

flexible metadata (MARC, OA)handling any kind of document (multimedia)customizable input, formatting and linking

personalization and collaborative features:alerts, baskets, groups, reviews, commentsinternationalisation (28 languages)

Books management and circulationopen source, GNU General Public License

co-developed by CERN (2002–), EPFL (2004–),DESY/FNAL/SLAC (2008–), CfA (2009–)installed at > 40 institutions world-wide

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Extra FeaturesPlugins

Compatibility withLibX: Invenio toolbar

LibX: http://libx.org/editions/download.php?edition=4F46CD81

Can be integrated with IExplorer and FirefoxbrowsersIntegration with the main digital content websitesincluding Amazon, Google Schoolar, WikipediaHighlighted text from a web page can be used todirectly query an Invenio installation

Zotero: Invenio can export its content to Zotero Firefoxplugin for compiling CVsCooliris: Invenio supports browsing multimedia contentas a 3 dimensional wall (due to the integration with theCooliris plugin)

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Modules Overview

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Modules OverviewScheduler

Monitoring and scheduling processes

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Ingestion ModulesOverview

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Ingestion ModulesSubmission: interfaces, workflows and functions

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Ingestion ModulesSubmission: interfaces, workflows and functions

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Processing ModulesOverview

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Processing ModulesExample: indexing

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Processing ModulesExample: ranking

Most Cited: count citationsAll-Times Best: PageRank (Google)‘Hot’ Trends: time-aware pagerank

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Processing ModulesExample: ranking

inspirehep.net (500 random points)

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Dissemination ModulesOverview

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Dissemination ModulesSearch Examples

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Curation ModulesOverview

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Curation ModulesExamples

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Access ModuleAuthentication management

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Access ModuleAuthorization management

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Modules SummaryInvenio

About 33 modulescodebase

290,000 lines of Python code12,000 lines of JavaScript code6,000 lines of XSL code5,000 lines of autotools code500 test cases

75 authors since inception25 authors and contributors in 2010many short-term studentsimportance of informal coding standards

10 years of development, started at CERN, first releasein 2002, now co-developed world-wide (EU, US)

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Technology Overviewused technologies

OpenSource GPL projectUnix/Linux Server sidePython (and C and Lisp), MySQL and Apache +mod_wsgiOther smaller dependenciesBased on open standards (MARCXML, MARC21,OAI-PMH, OpenURL...)Medium to big data repositoriesFlexible at every layer

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Technology Overviewconcepts

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Technology Overviewlanguages

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why Python ?languages

easy to read and understand (good for many temporarydevelopers)suitable for rapid prototyping (good for organic-growthsoftware development model)write code to throw it away

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why Python ?art of ikebana programming

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why Python ?Speeding up Pyhton

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Development modelGit distributed environment

good for distributed teamsoffline development possible“pull on demand” collaboration model (as opposed to“shared push” collaboration model)

inherent,natural code review process

commit early, commit often (to private repositories)rebase and clean (before pushing for publicconsumption)interplay with SVN

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Development modelGit collaboration model

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Development modelTest Suite: unit test

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Development modelTest Suite: functional test

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Development modelTest Suite: web testing

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building Indexesloading Web vs App Server

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building Indexesload split

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building Indexesdesigning a search engine

performance-driven design assumptions:high number of selects, low number of updatesfast searching, slow indexationcache everything cacheable

search functionality:search for words, phrases, regular expressionssearch in any field, authors, titles, etc

index design:forward indexes: rec1 –> [word1, word8, . . . ]rec2 –> [word1, word2, . . . ]reverse indexes: word1 –> [rec1, rec2, . . . ]word2 –> [rec2, rec7, . . . ]

Zipf’s law on word frequency:few words occur very often (e.g. the)most words are infrequent (even e.g. boson)

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building IndexesSearch engine under cover

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building IndexesMeasuring the performance

three important speed factors to consider:speed of finding sets (DB Server)speed of demarshaling sets (DB <–> Web App Server)speed of intersecting sets (Web App Server)

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building IndexesOptimizing data structures

data structures tested:‘sorted’ (lists, Patricia trees)‘unsorted’ (hashed sets, binary vectors)

fast prototyping: (Python, Lisp in 2002)

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building IndexesBinary vectors

binary vectors found the best compromise!using Numeric Python moduletypical search time gain: 4.0 sec –> 0.2 sectypical indexing time loss: 7 hours –> 4 daysmostly spare data modelled via mostly dense datastructure? free your mind, think critically

further optimization:Numeric module not addressing real bits, only bytesso home-made intbitset C extension in 2007

addressing real bits (factor of 8 already)saving space, saving (indexing) time

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Conclusion

selected lessons from building a digital library systemwith about 300,000 LOCs from 75 authors over 10years

value of rapid prototypingvalue of organic-growth software development modelvalue of coding aesthetics and minimalism

Evolution and challenges of digital librariesIncrease of InteroperabilityOpen Access and Publising model evolutionThe Data Continuum, connecting DLs and ScienceDatasets

top related