introduction to digital library technology - the invenio ...ais-grid-2011.jinr.ru/docs/j-y. le...

84
Digital Library J-Y Le Meur Digital Library Definitions Examples Standards Summary Digital Library Software Specialized SW - History Invenio features Invenio modular architecture Technology Overview Python Development environment Building efficient Indexes Conclusion Introduction to Digital Library Technology The INVENIO software J-Y. Le Meur Department of Information Technology CERN 24-10-2011 /JINR-CERN School on GRID and Information Management Systems

Upload: others

Post on 31-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Introduction to Digital Library TechnologyThe INVENIO software

J-Y. Le Meur

Department of Information TechnologyCERN

24-10-2011 /JINR-CERN School on GRID andInformation Management Systems

Page 2: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 3: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 4: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

What is a Digital Library ?

Page 5: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

What is a Digital Library ?

A library in which collections are stored in digitalformats (as opposed to print, microform, or othermedia) and accessible by computers. (...) A digitallibrary is a type of information retrieval system.A virtual organisation, that comprehensively collects,manages and preserves for the long time rich digitalcontent, and offers to its targeted user communitiesspecialised functionality on that content.(1) institutional document repositories(2) world-wide subject-based information systems

Page 6: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

What is a Digital Library ?

A library in which collections are stored in digitalformats (as opposed to print, microform, or othermedia) and accessible by computers. (...) A digitallibrary is a type of information retrieval system.A virtual organisation, that comprehensively collects,manages and preserves for the long time rich digitalcontent, and offers to its targeted user communitiesspecialised functionality on that content.(1) institutional document repositories(2) world-wide subject-based information systems

Page 7: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Some key notions

Repository type: institutional versus disciplinaryHybrid libraries: electronic resources versus traditionalprint materialContent type: born digital versus converted contentArchive concept: traditional Archive versus digitalArchiveLibrary type: digital versus virtualOpen access: Green versus Gold

Page 8: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 9: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

What is a Digital Library ?Ex 1: CERN Document Server

Example 1: CERN Document Server

managing CERN and selected non-CERN high-energyphysics and related documents since 1993more than 1,000,000 recordsarticles, books, theses, photos, videos, and morepowered by Invenio, free digital library softwarehttp://cdsweb.cern.ch/

Page 10: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS: Collection tree

Page 11: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS: Search for Books

Page 12: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS: Search for photos

Page 13: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS Feature: Commenting

Page 14: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS Feature: Create Personnal Alert

Page 15: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS Feature: Add to Basket

Page 16: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS Feature: Display Personnal Basket

Page 17: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS Feature: Organise and Share Baskets

Page 18: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

CDS: Journals and Bulletins

Page 19: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

What is a Digital Library ?Ex 2: INSPIRE

Example 2: INSPIRE

world-wide high-energy physics information systemrun by CERN, DESY, FNAL, SLACmetadata curation since 1960s, Invenio technologysince 2007citation analysis, author/affiliation analysisclose partnership with arXiv and ADShttp://inspirehep.org/

Page 20: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

INSPIRE: full-text search

Page 21: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

INSPIRE: Cite Summary

Page 22: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

INSPIRE: Citation History

Page 23: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

INSPIRE: Author pages

Page 24: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Whats is a Digital Library ?Ex3: The JDS Digital Library: jdsweb.jinr.ru

Page 25: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Other Famous examples ?

Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage

Page 26: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Other Famous examples ?

Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage

Page 27: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Other Famous examples ?

Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage

Page 28: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Other Famous examples ?

Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage

Page 29: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Other Famous examples ?

Are these digital libraries ?Google Web | Books | Scholar ?Eprint ArXivLibrary of Congress and American MemoryInternet ArchiveWorld’s total yearly production of print, film, optical, andmagnetic content would require roughly 1.5 billiongigabytes of storage

Page 30: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 31: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Library Standardsexchange, identifiers and preservation

Exchange protocols: Z39.50 and OAI-PMHbetween Data and Service providers

Interoperability: SWORD = Simple Web-serviceOffering Repository DepositIdentifiers: ISBN and DOIPreservation: METS, PDF/A, OAIS

Content description: Metadata Encoding andTransmission StandardData formatsSupporting system: Open Archival Information Systemref. model

Content representation: MARC, DCXML-MARC

Page 32: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Library StandardsContent representation

Metadata: data about dataMetadata types: descriptive, structural andadministrativeMetadata schema: set of defined elements (e.g.MARC, DC)MARC: MAchine Readable Cataloguing, internationalstandard for representing and communicatingbibliographic records, developed in the 60s, cataloguecard oriented, high degree of complexity to cover allpurposeXML-MARC: XML schema based on MARC21developed by Library of Congress

Page 33: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

MARC and XML-MARC examples

Tags, identifiers and subcodes001__ 1337270

037__ $$aCERN-PH-EP-2011-030

100__ $$aClerbaux, Barbara $$eed. $$iINSPIRE-00314890 $$uBrussels U.

245__ $$aSearch for New Physics in Dijet

260__ $$c2011

520__ $$aA search for new interactions and resonances [..]

XML-MARC: tag 100<datafield tag="100" ind1=" " ind2=" ">

<subfield code="a">Clerbaux, Barbara</subfield>

<subfield code=“e">ed.</subfield>

<subfield code=“i”>INSPIRE-00314890</subfield>

<subfield code="u">Brussels U.</subfield>

</datafield>

Page 34: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 35: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Digital LibrarySummary

Definitions of a digital libraryThe variety of types and concepts behind "DigitalLibrary"Examples of institutional and subject-basedrepositoriesSome functionnalities of Digital LibraiesSome important standards: MARC, SWORD, OAI-PMHNext: the need for specialized software

Page 36: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Digital LibrarySummary

Page 37: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 38: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why specialized Software ?

Specialist software for building, maintaining, managing orrunning digital libraries.Institutional repository software focuses primarily on ingest,preservation and access of locally produced documents,particularly locally produced academic outputs.

Content is organized and ready for exchange (supportof interoperability protocols)Metadata and Data is preserved for long term (supportof preservation standards)Submission, Edition, Curation processes are supportedDissemination is organized and controlledSW examples: Eprints, DSpace, Fedora, Greenstone...Invenio

Page 39: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why specialized Software ?

Specialist software for building, maintaining, managing orrunning digital libraries.Institutional repository software focuses primarily on ingest,preservation and access of locally produced documents,particularly locally produced academic outputs.

Content is organized and ready for exchange (supportof interoperability protocols)Metadata and Data is preserved for long term (supportof preservation standards)Submission, Edition, Curation processes are supportedDissemination is organized and controlledSW examples: Eprints, DSpace, Fedora, Greenstone...Invenio

Page 40: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why at CERN ?an interesting challenge

A physicist office

Page 41: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Invenio History

1954: CERN laboratory is created1989: Tim Berners-Lee invents the Web1991: SPIRES (SLAC) is the first database on the WebArXiv, the archive of Physics papers, moves to the Web1993: CERN Preprint Server starts as an institutionaland disciplinary repository1996: CERN Library Server includes Books andPeriodicals, as an hybrid library2000: CERN Document Server includes Multimediamaterial and restricted notes2002: CDSWare SW released open source2006: CDSWare becomes Invenio; start of I18Ncollaborations2010: Invenio 1.0 released and adopted world-wide

Page 42: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 43: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Key featuresinvenio-software.org

navigable collection tree (regular, virtual, hosted)powerful search engine

Google-like speed for up to 5M recordscombined metadata, reference and fulltext search

flexible metadata (MARC, OA)handling any kind of document (multimedia)customizable input, formatting and linking

personalization and collaborative features:alerts, baskets, groups, reviews, commentsinternationalisation (28 languages)

Books management and circulationopen source, GNU General Public License

co-developed by CERN (2002–), EPFL (2004–),DESY/FNAL/SLAC (2008–), CfA (2009–)installed at > 40 institutions world-wide

Page 44: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Extra FeaturesPlugins

Compatibility withLibX: Invenio toolbar

LibX: http://libx.org/editions/download.php?edition=4F46CD81

Can be integrated with IExplorer and FirefoxbrowsersIntegration with the main digital content websitesincluding Amazon, Google Schoolar, WikipediaHighlighted text from a web page can be used todirectly query an Invenio installation

Zotero: Invenio can export its content to Zotero Firefoxplugin for compiling CVsCooliris: Invenio supports browsing multimedia contentas a 3 dimensional wall (due to the integration with theCooliris plugin)

Page 45: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 46: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Modules Overview

Page 47: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Modules OverviewScheduler

Monitoring and scheduling processes

Page 48: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Ingestion ModulesOverview

Page 49: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Ingestion ModulesSubmission: interfaces, workflows and functions

Page 50: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Ingestion ModulesSubmission: interfaces, workflows and functions

Page 51: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Processing ModulesOverview

Page 52: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Processing ModulesExample: indexing

Page 53: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Processing ModulesExample: ranking

Most Cited: count citationsAll-Times Best: PageRank (Google)‘Hot’ Trends: time-aware pagerank

Page 54: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Processing ModulesExample: ranking

inspirehep.net (500 random points)

Page 55: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Dissemination ModulesOverview

Page 56: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Dissemination ModulesSearch Examples

Page 57: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Curation ModulesOverview

Page 58: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Curation ModulesExamples

Page 59: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Access ModuleAuthentication management

Page 60: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Access ModuleAuthorization management

Page 61: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Modules SummaryInvenio

About 33 modulescodebase

290,000 lines of Python code12,000 lines of JavaScript code6,000 lines of XSL code5,000 lines of autotools code500 test cases

75 authors since inception25 authors and contributors in 2010many short-term studentsimportance of informal coding standards

10 years of development, started at CERN, first releasein 2002, now co-developed world-wide (EU, US)

Page 62: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 63: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Technology Overviewused technologies

OpenSource GPL projectUnix/Linux Server sidePython (and C and Lisp), MySQL and Apache +mod_wsgiOther smaller dependenciesBased on open standards (MARCXML, MARC21,OAI-PMH, OpenURL...)Medium to big data repositoriesFlexible at every layer

Page 64: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Technology Overviewconcepts

Page 65: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Technology Overviewlanguages

Page 66: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 67: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why Python ?languages

easy to read and understand (good for many temporarydevelopers)suitable for rapid prototyping (good for organic-growthsoftware development model)write code to throw it away

Page 68: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why Python ?art of ikebana programming

Page 69: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Why Python ?Speeding up Pyhton

Page 70: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 71: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Development modelGit distributed environment

good for distributed teamsoffline development possible“pull on demand” collaboration model (as opposed to“shared push” collaboration model)

inherent,natural code review process

commit early, commit often (to private repositories)rebase and clean (before pushing for publicconsumption)interplay with SVN

Page 72: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Development modelGit collaboration model

Page 73: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Development modelTest Suite: unit test

Page 74: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Development modelTest Suite: functional test

Page 75: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Development modelTest Suite: web testing

Page 76: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Outline

1 Digital LibraryDefinitionsExamplesStandardsSummary

2 Digital Library SoftwareSpecialized SW - HistoryInvenio featuresInvenio modular architecture

3 TechnologyOverviewPythonDevelopment environmentBuilding efficient Indexes

4 Conclusion

Page 77: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building Indexesloading Web vs App Server

Page 78: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building Indexesload split

Page 79: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building Indexesdesigning a search engine

performance-driven design assumptions:high number of selects, low number of updatesfast searching, slow indexationcache everything cacheable

search functionality:search for words, phrases, regular expressionssearch in any field, authors, titles, etc

index design:forward indexes: rec1 –> [word1, word8, . . . ]rec2 –> [word1, word2, . . . ]reverse indexes: word1 –> [rec1, rec2, . . . ]word2 –> [rec2, rec7, . . . ]

Zipf’s law on word frequency:few words occur very often (e.g. the)most words are infrequent (even e.g. boson)

Page 80: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building IndexesSearch engine under cover

Page 81: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building IndexesMeasuring the performance

three important speed factors to consider:speed of finding sets (DB Server)speed of demarshaling sets (DB <–> Web App Server)speed of intersecting sets (Web App Server)

Page 82: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building IndexesOptimizing data structures

data structures tested:‘sorted’ (lists, Patricia trees)‘unsorted’ (hashed sets, binary vectors)

fast prototyping: (Python, Lisp in 2002)

Page 83: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Building IndexesBinary vectors

binary vectors found the best compromise!using Numeric Python moduletypical search time gain: 4.0 sec –> 0.2 sectypical indexing time loss: 7 hours –> 4 daysmostly spare data modelled via mostly dense datastructure? free your mind, think critically

further optimization:Numeric module not addressing real bits, only bytesso home-made intbitset C extension in 2007

addressing real bits (factor of 8 already)saving space, saving (indexing) time

Page 84: Introduction to Digital Library Technology - The INVENIO ...ais-grid-2011.jinr.ru/docs/J-Y. Le Meur... · Invenio features Invenio modular architecture Technology Overview Python

Digital Library

J-Y Le Meur

Digital LibraryDefinitions

Examples

Standards

Summary

Digital LibrarySoftwareSpecialized SW -History

Invenio features

Invenio modulararchitecture

TechnologyOverview

Python

Developmentenvironment

Building efficientIndexes

Conclusion

Conclusion

selected lessons from building a digital library systemwith about 300,000 LOCs from 75 authors over 10years

value of rapid prototypingvalue of organic-growth software development modelvalue of coding aesthetics and minimalism

Evolution and challenges of digital librariesIncrease of InteroperabilityOpen Access and Publising model evolutionThe Data Continuum, connecting DLs and ScienceDatasets