lita national forum 2012

34
Building the New Open Linked Library (Revisited) Joel Richard LITA National Forum 2012 October 5, 2012

Upload: joel-richard

Post on 20-Dec-2014

495 views

Category:

Documents


2 download

DESCRIPTION

A followup on our 2011 presentation on the new Linked Open Digital Library, discussing how we are creating a digital library centered around LInked Open Data. Include details on how we are creating a dataset of botanists and their publications that is to be shared as linked open data.

TRANSCRIPT

  • 1. Building the NewOpen LinkedLibrary(Revisited) Joel Richard LITA National Forum 2012 October 5, 2012

2. Smithsonian Libraries Founded in 1846 1.5 m volumes in collection, plus assortedarchival collections 15,000 volumes scanned and online 20 libraries serving ~500 researchers/curators+ hundreds of fellows and interns 105 library staff 1.5 web staff Founding member of the BiodiversityHeritage Library Le Garde-meuble, ancien et moderne [Furniture repository, ancient and modern], 1839-1935 3. (From 2011)Drupal and Linked Data Native support for RDFa in Drupal 7. RDF Extensions (rdfx) even more features. Vocabularies can be imported and cached forreuse. Few or no modifications to HTML to supportRDFa.Whats the difference between RDF,RDF/XML and RDFa?LITA National Forum, September 30,2011 4. (From 2011)RDF/XML Sample URI: http://library.si.edu/book/origin-of-species.rdf The Origin of SpeciesNovember 24, 18591000english LITA National Forum, September 30, 2011 5. TL-2 Page Sample(From 2011) http://library.si.edu/tl2/author/darwin tl2:creatorOf http://library.si.edu/tl2/book/1313 owl:sameAs http://viaf.org/viaf/27063124 http://library.si.edu/tl2/book/1313 dc:creator http://library.si.edu/tl2/author/darwin owl:sameAs http://www.archive.org/details/ originofspecies00darwuoft LITA National Forum, September 30, 2011 6. TL-2 Page Sample Results(From 2011)http://library.si.edu/tl2/author/darwinhttp://library.si.edu/tl2/book/1313tl2:creatorOfdc:creator http://library.si.edu/tl2/book/1313http://library.si.edu/tl2/author/darwinowl:sameAs owl:sameAs http://viaf.org/viaf/27063124http://www.archive.org/details/ originofspecies00darwuoftfoaf:lastName Darwin tl2:bookNumber 1313foaf:familyName Darwin bibo:shortTitle On the origin of speciesfoaf:firstName Charles dc:title On the origin of species by meansfoaf:givenName Charlesof natural selection, or the preservationof favoured races in the struggle forfoaf:name Darwin, Charles Robertlife.skos:prefLabel Darwin, Charles Robertevent:place Londontl2:birthYear 1809 dc:publisher John Murraytl2:deathYear 1882 dc:created 1859tl2:description British evolutionary biologist tl2:bookAbbreviation Origin sp.tl2:personAbbrev Darwin LITA National Forum, September 30, 2011 7. (From 2011)LITA National Forum, September 30,2011 8. (From 2011) Who is reusing our data?Ryan Schenk http://ryanschenk.com/2011/02/visualizing-taxonomic-synoymns/ LITA National Forum, September 30, 2011 9. (From 2011)Who is reusing our data?Encyclopedia of Life http://eol.org/ LITA National Forum, September 30, 2011 10. Linked Data Review Publishing structured data on the web RDF (Resource Description Framework) Enables queries computer 2 computer Uses standard ontologies (vocabularies) Data in is presented as triplesURIhttp://library.si.edu/tl2/author/charles-darwinPredicate owl:sameAsObject http://viaf.org/viaf/27063124 11. Linked Data In ActionGoogle Knowledge Graph 12. Linked Data Review Feb 12 1809Born On TypeCity Born In Charles Darwin ShrewsburyIs InEnglandTypePersonTypeCountry 13. Our WebsiteOrganically grown since 1995 83,000 HTML pages 3,700 ColdFusion pages 253,000 JPEG files 27,000 PNG files 46,000 PDFs No CMS for legacy information Now using Drupal for Brochure-ware 14. Content Analysis 400+ Online books Exhibitions Research Tools Image Collections (16,000+ images) Brochure content (About us, Locations, Hours) Bibliographies, Fact Sheets, Subject Guides Databases, inventories, and database-like books Collections not on our website: ~15,000 digitized volumes, with many more planned Other analog collections that will be digitizedBureau of American Ethnology Bulletin 164; Sewing Machine Trade Literature; Underwater Web Exhibition, Smithsonian Libraries 15. Linked Data in our LibraryBooks (and book-like objects) Expose bibliographic data for reuse Consume links to other internalcontent and external authoritativedataDatabases Expose data previously unavailable Provide authoritative data Consume our data and others tocreate new aggregate websites 16. Linked Data in our Books http://library.si.edu/tl2/author/darwin RDF Type = foaf:Personfoaf:lastName, foaf:familyNamefoaf:firstName, foaf:givenNamefoaf:name, skos:prefLabeltl2:birthYeartl2:deathYeartl2:descriptiontl2:personAbbrev http://library.si.edu/tl2/book/1313 RDF Type = bibo:Booktl2:bookNumberdc:titleevent:placedc:publishertl2:bookAbbreviationdc:created 17. Linked Data Tools (Drupal) Fields, Views, Views UI Node Reference SPARQL Endpoint , SPARQL API RESTful Web Services SPARQL Views RDF External Vocabulary ImporterCaveat: Some modules not ready for Drupal 7 i.e., Biblio module (no CCK, RDF capabilities) 18. DisclaimerWe are still learning!How to effectively use Drupal What goes into a Digital LibraryHow to best leverage Linked Open Data(Also: We will always be learning.) J. L. Hammett Illustrated Catalogue of School Merchandise 1872-1873, 1872-1874 19. What is a Digital Library? More than a virtual stack of books Digital allows more capabilities, access Interlinked Content (See more from this item)What content will be in our digital library? Digitized Books Lists / Bibliographies Image Library Smithsonian Publications Collections (of things) Videos Exhibitions Trade Literature and Databasesother non-cataloged items 20. Knowledge/Data SharingTaxonomic Literature II Index Animalium Essential botanical 35 Volumesreference 430,000 Scientific 15 volumesNames Each with a citation to 9,000 Botanistsfirst description 37,000 Titles authored 7000+ items in theby these botanistsbibliography, many More modern, simpler to linked to WorldCathandle Older, challenging innature 21. Our Process for TL-2Scanned the pagesHired contractor for OCR and correction (99.97% accuracy)Received XML dataset from ContractorVerified and Imported to SQL ServerBuilt a website to search the data 22. TL-2 Today 23. Before we importWhat exactly does 99.97% accuracy mean? ~12,000 Errors 24. ImportingMillions of records are no problem formodern databases. But, how to get datainto Drupal? Use existing tools? Create my own import?The Muralo Company Muralo: Sanitary Wall Coatings in the Home, 1912 25. ImportingImport via existing tools Used Drupals Feeds Importer Typically used for importing RSS or similar Fast to set up (< 5 minutes) Slow to import (47,000 records = 8+ hours) Poor error recovery (imported 5 times) What if the data changes in the future? Faster Better 26. ImportingWrite my own import. But how? Make a Drupal Module! Steep Learning Curve (many APIs) Faster to set up (48,000 records = 85 minutes) Added bonus: Modules can be versioned Can use the version update code to update our data Versioned modules good for Dev / Prod servers 27. ImportingDigitized Books Online Similar module for importing Module also handles a page for reading books online Uses Internet Archive book reader in an Links to WorldCat / VIAF FAST Subjects Table of Contents Navigation Eligible for Linked Open Datahttp://archive.org/details/smithsonian 28. Data Schema: British Libraryhttp://talis-systems.com/wp-content/uploads/2011/07/British-Library-Data-Model-v1.01.pdf 29. Data SchemaWhat data model are we going to use? British Library Schema.org Something else?What vocabularies are we using? Dublin Core FOAF OWL Event? SKOS Org? BIBO Geo? BIO Our own vocabulary for TL-2 30. Other ContentGalaxy of Images Image collection of plates from our digitized books 18,000 images and growing Richer set of metadata Data needs to be massaged / imported Images served from another systemhttp://www.sil.si.edu/imagegalaxy/ 31. Other ContentVideos All are currently on YouTube Will remain there for now Metadata to be imported to Digital Library Will eventually be served from our networkhttp://www.youtube.com/smithsonianlibraries 32. Other Content Collections and Exhibitions Bibliographies, lists, subject guides Trade Literature Sewing machines! Scientific equipment! Seed Catalogs! Smithsonian Publications (DSpace) Smithsonian Libraries Blog Art and Artist Vertical FilesW. Atlee Burpee & Co. Burpees New Annual for 1910, 1910 33. Future Work More planning! Developing a LOD Vocabulary forTL-2 Continued parsing of content inTL-2 Continuing the development ofthe Index Animalium content Publishing the Index Animaliumon the web as LOD How to leverage linked data to create what? Leopoldo Galluzzo Altre scoverte fatte nella luna dal Sigr. Herschel , 1836 34. Thank you!Joel [email protected]@cajunjoelhttp://slideshare.net/joelrichardhttp://library.si.edu/staff/richardjm