taking tl-2 online: a linked data resource

24
Taking TL-2 Online A Linked Data Resource Martin R. Kalfatovic Joel Richard Smithsonian Libraries TDWG 2011 Annual Conference New Orleans, LA 18 October 2011

Upload: martin-kalfatovic

Post on 03-Jul-2015

1.168 views

Category:

Education


0 download

DESCRIPTION

Taking TL-2 Online: A Linked Data Resource. Martin R. Kalfatovic and Joel Richard. TDWG 2011 Annual Conference. New Orleans, LA. 18 October 2011.

TRANSCRIPT

Page 1: Taking TL-2 Online: A Linked Data Resource

Taking TL-2 Online

A Linked Data Resource

Martin R. KalfatovicJoel RichardSmithsonian

Libraries

TDWG2011 Annual Conference

New Orleans, LA18 October 2011

Page 2: Taking TL-2 Online: A Linked Data Resource

TL1 (1967) xx, 556 pp. A-Z [not numbered]TL-2/1 2 (1976) xl, 1136 pp. A-G 1-2223TL-2/2 (1979) xviii, 991 pp. H-Le 2224-4483TL-2/3 (1981) xii, 980 pp. Lh-O 4484-7174TL-2/4 (1983) ix, 1214 pp. P-Sak 7175-10,104TL-2/5 (1985) [v], 1066 pp. Sal-Ste 10,105-13,105TL-2/6 (1986) [v], 926 pp. Sti-Vuy 13,106-16,459TL-2/7 (1988) lvi, 653 pp. W-Z 16,460-18,785Suppl. 13 (1992) viii, 453 pp. A-Ba 18,786-20,458Suppl. 2 (1993) vi, 464 pp. Be-Bo 20,459-22,485Suppl. 3 (1995) vi, 550 pp. Br-Ca 22,486-25,190Suppl. 4 (1997) vi, 614 pp. Ce-Cz 25,191-28,566Suppl. 5 (1998) viii, 431 pp. Da-Di 28,567-30,948Suppl. 6 (2000) vi, 518 pp. Do-E 30,949-33,658Suppl. 7 (2009) xviii, 469 pp. F-Frer 33,659-35,497Suppl. 8 (2009) viii, 560 pp. Fress-G 35,498-37,609

TL-2 print volumes …

by the numbers

Page 3: Taking TL-2 Online: A Linked Data Resource

While TL-2 is an essential reference tool to plant scientists, reference librarians, and catalogers, it is less well known to the broader natural sciences community. The community will immediately benefit from open online access to the detailed treatments of 9,072 authors and 37,609 numbered citations....

The content provides precise dates of publication, details about specific titles, editions, and related publications, associated authors, biographical details including dates, education and career highlights, and disposition of herbaria.

-Judy Warnement, Harvard Univ. Botany Libraries

Page 4: Taking TL-2 Online: A Linked Data Resource

Scope of the Project

> Print version: 15 volumes> Pages: ~ 11, 000> Characters/Page Avg. of 3,400> Author entries: ~ 44, 000> Image files: ~ 9 GB in size

Page 5: Taking TL-2 Online: A Linked Data Resource

Project Starts: 2010

IAPT and Smithsonian Libraries sign agreement to create a new online version of TL-2

Funded by the Smithsonian's Atherton Seidell Endowment

Page 6: Taking TL-2 Online: A Linked Data Resource

Image and File Creation

Scanned print volumes done through Internet Archive

100% quality control review of scanned pages by SIL staff

Re-keyed to 99.97% accuracy

Page 7: Taking TL-2 Online: A Linked Data Resource

ProjectMilestones I

January 2011

Scanning of print volumes complete; image files on BHL

December 2010

IAPT staff provide machine-readable versions of more recent volumes

Page 8: Taking TL-2 Online: A Linked Data Resource

ProjectMilestones II

November 2011

Completion of text conversion of TL-2

September 2011

Test conversion done and conversion methodologies approved

Page 9: Taking TL-2 Online: A Linked Data Resource

ProjectMilestones III

January 2012

TL-2 Online, version 1.0 publicly available via a Smithsonian Institution Libraries website. This version will provide, at a minimum, all the functionality currently provided by the limited access version

Page 10: Taking TL-2 Online: A Linked Data Resource

Example of Conversion

Specs

> Introduction sections can be omitted. Introductory text to the indexes can be omitted.> Accented letters and diacritics must be preserved.> The beginning of all non-indented lines should be indicated with a <br/> tag. > Indented lines of text are not indicated> Bold and italicized text should not be indicated.> Hyphenated words will be maintained throughout the text. > The presence of a tables should be indicated by a <Table/> tag. No other parsing should be done. > Each line in the indexes will be converted to simple XML, but not parsed into fields.

Page 11: Taking TL-2 Online: A Linked Data Resource

XSD andSample ofConverted

Text

Page 12: Taking TL-2 Online: A Linked Data Resource
Page 13: Taking TL-2 Online: A Linked Data Resource
Page 14: Taking TL-2 Online: A Linked Data Resource

What do you Want!!!

February 2012 forward

TL-2 Online, version X.0; additional functionality will be added to the TL-2 Online version; details of this functionality will be developed with the input of the botanical & taxonomic community.

Page 15: Taking TL-2 Online: A Linked Data Resource

ObligatoryBiodiversity

HeritageLibrarySlide

Page 16: Taking TL-2 Online: A Linked Data Resource
Page 17: Taking TL-2 Online: A Linked Data Resource
Page 18: Taking TL-2 Online: A Linked Data Resource

Planned Future

Developments I

Initially TL-2 will be presented in a basic website that is searchable by keyword, botanist name, TL-2 title number, or TL-2 botanist or title abbreviation. The website will display the search results with the scanned page (as a zoomable JPEGs) and the parallel OCRed and corrected text. The full OCRed text may be made available for download and the scanned pages can also be browsed in a "page turning" application. This will be the form that the TL-2 site will take before migration to the Libraries' Digital Library website next spring.

Page 19: Taking TL-2 Online: A Linked Data Resource

Planned Future

Developments II

The second round of planned improvements to the TL-2 site includes implementing Linked Open Data for the entire TL-2 dataset. This computer-friendly format will enhance the reusability of the TL-2 data for projects now and in the future. Each botanist and title will have a unique URI on the Libraries' website. This URI will be a permanent, authoritative location on the web for the botanists and titles and information about each in both human-readable form (via HTML) and computer-readable form (via RDF/XML.) The implementation of Linked Open Data also facilitates the creation of a SPARQL endpoint, which allows the data contained in our website to be queried like a database.

Page 20: Taking TL-2 Online: A Linked Data Resource

Planned Future

Developments III

We plan to add to our linked data by parsing the herbaria names with the goal of linking them to their real names in the TL-2 index and to an external location on the Web. Once the herbaria are identified and linked, they can be search forwards by listing the herbaria containing a botanists' plant specimens and backwards by indicating which TL-2 botanists contributed to a given herbarium.

Page 21: Taking TL-2 Online: A Linked Data Resource

Planned Future

Developments IV

Additionally, we plan to look up each botanist in TL-2 to their record at the Virtual International Authority File (VIAF) to improve identification of the botanist and the ability to link to other sources on the internet. Similarly, we hope to decode and resolve the bibliographic entries for each botanist and link them to the Biodiversity Heritage Library or other appropriate online databases.

Page 22: Taking TL-2 Online: A Linked Data Resource

Planned Future

Developments V

Finally, the each botanist may have one or more species that are named after them. This information includes a genus name, the person who named the genus (the author) and the year that the name was created. We aim to identify the species names and link them to the Encyclopedia of Life, the Biodiversity Heritage Library, or other more appropriate online databases of species names. Additionally, we would like to connect the author to his or her record in TL-2, if it exists, thereby creating additional internal cross-links in TL-2.

Page 23: Taking TL-2 Online: A Linked Data Resource

Can you name the

botanists you saw?

Herman BoerhaaveNathaniel Lord BrittonA.P. de CandolleCarolus ClusiusCadwallader ColdenErasmus DarwinR.L. DesfontainesLarry DorrHenry EnglefieldJoseph Dalton Hooker

C.M. HoveyN.J. von JacquinBernard de JussieuCarl LinnaeusC.F.B. MirbelDan NicholsonJ.W. PalmstruchRichard PultenyHenry ShawMartin VaulJudy Warnement

Page 24: Taking TL-2 Online: A Linked Data Resource

Thanks to the following

collaborators on the project

Susan FraserDon WheelerJudy WarnementDoug HollandChris FreelandIAPTUnlimited PrioritiesInternet ArchiveData Conversion Lab

Smithsonian TeamGilbert BorregoGrace CostantinoLarry DorrRobin EverlySue GravesSuzanne PilskJoel RichardKeri Thompson