BookshelfLeafing through XML
NLM Journal Article Tag Suite Conference 2010
Martin Latterner and Marilu HoeppnerNational Center for Biotechnology Information
National Library of Medicine
next><prev
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
NLM Collection Catalog
PubMed AbstractsElectronic Literature
Archive
Books, Monographs, Reports
Journals
Other publication formats
Book chapters, Monographs, Reports
Books in PubMed
Non-PubMed Books
User guides, Documentation
Journal articles PMC Journals PubMed Central
Bookshelf
Entrez Literature Resources
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Features of the Book DTDBooks and journals within PubMed CentralBookshelf WorkflowsIntegration of information between databases
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Modifications
Allowed icon as a child of exlnk.Allowed pre as a child of entry.Allowed glossary as a child of chapter.Added type: ppt.Added attributes id and BID to <foot>.Added attribute id to <p>.Added <title>, child of <bibsect>.Added <bb>, <gf> and <figgrp> as children of <linkgrp>.Added <email> as child of <txtstyle>.Added <pdf> as child of <glossary>.Added <figgrp1> as child of <entry>.…
NCBI Book DTD 1.0Based on ISO 12083 Article DTD
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
March 2003v1.0
December 2004v2.0
November 2005v2.1
BOOKSHELF XML DATANCBI BOOK DTD
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Book DTDof the
NLM Journal Article Tag Suite
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Designed to capture the semantic elements of the content, not form
e.g. bibliographic metadata
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<front>
<div type="titlepage" level="1" id="2001902bddd00001"> <booktitle> <ils style="strong">CONFLICT OF INTEREST IN MEDICAL RESEARCH</ils> </booktitle> <bookauthor> <bookauthor.name>Committee on Conflict of Interest in Medical Research</bookauthor.name> <bookauthor.info>Board on Health Sciences Policy</bookauthor.info> <bookauthor.info>INSTITUTE OF MEDICINE <ils style="smallcap"> <ils style="emphasis"> OF THE NATIONAL ACADEMIES</ils> </ils> </bookauthor.info> </bookauthor> <publication.stmt> <p style="center"> <publisher> <publisher.name>THE NATIONAL ACADEMIES PRESS</publisher.name> <publisher.address><state>Washington, D.C.</state></publisher.address> </publisher> </p> </publication.stmt> <page number="ii" id="2001902bppp00002"/> </div>
<div type="copyrightpage" level="1" id="2001902bddd00002"> <publication.stmt> <p style="normal"> <publisher> <publisher.name><ils style="strong">THE NATIONAL ACADEMIES PRESS</ils></publisher.name> <publisher.address> <street><ils style="strong">500 Fifth Street, N.W.</ils></street> <state><ils style="strong">Washington, DC</ils></state> <postcode><ils style="strong">20001</ils></postcode> </publisher.address> </publisher> </p> </publication.stmt> <publication.stmt> <p style="flindent">ISBN <isbn>978-0-309-13188-9</isbn> (hardcover)</p> </publication.stmt> <copyright>Copyright <copyright.year>2009</copyright.year> by the <copyright.holder>National Academy of Sciences</copyright.holder>. All rights reserved.</copyright> <printinfo> <print>Printed in the United States of America</print> </printinfo> </div></front>
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<book-meta> <book-title-group> <book-title>Conflict of Interest in Medical Research</book-title> </book-title-group> <contrib-group> <contrib contrib-type="author"> <collab>Institute of Medicine (US) Committee on Conflict of Interest in Medical Research, Education, and Practice</collab> </contrib> </contrib-group> <publisher> <publisher-name>National Academies Press (US)</publisher-name> <publisher-loc>Washington (DC)</publisher-loc> </publisher> <isbn>978-0-309-13188-9</isbn> <pub-date pub-type="ppub"> <year>2009</year> </pub-date> <permissions> <copyright-statement>Copyright © 2009, National Academy of Sciences</copyright-statement> <copyright-year>2009</copyright-year> </permissions></book-meta>
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
More granular text descriptions are handled at attribute level
e.g. preface, foreword
<sec sec-type=“preface”>
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Article Book
<abbrev-journal-title><article><article-categories><article-id><article-meta><conf-acronym><conference><conf-num><conf-theme><floats-group><front><front-stub><issue-sponsor><journal-meta><journal-subtitle><journal-title><journal-title-group><response><series-text><series-title><string-conf><sub-article><unstructured-kwd-group><x>
<alternate-form><area><book><book-front><book-meta><book-part><book-part-categories><book-part-meta><book-title><book-title-group><collection><collection-id><collection-list><collection-member><collection-meta><collection-name><map><map-group><multi-link>
DTD v3.0Elements
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<map-group>
XML
<map-group id="my-map-id"> <graphic xlink:href="img-uri"/> <map map-name="my-map"> <area map-shape="rect" map-coords="1,1,51,76" xlink:href="uri1"/> <area map-shape="rect" map-coords="54,4,94,74" xlink:href="ur2"/> </map></map-group>
XHTML
<img src="img-uri" usemap="#my-map-id"/><map id="my-map-id" name="my-map"> <area href="uri1" shape="rect" coords="1,1,51,76"/> <area href="uri2" shape="rect" coords="54,4,94,74"/></map>
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<multi-link>
XML
<multi-link> <term>IDDM2</term> <ext-link ext-link-type="url" xlink:href="LINK1">Bookshelf</ext-link> <ext-link ext-link-type="url" xlink:href="LINK2">PubMed Central</ext-link>…</multi-link>
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Article Bookabbrev-typearticle-typeresponse-type
alternate-form-typebook-idbook-part-numberbook-part-typegraphic-type (obsolete)indexedmap-altmap-coordsmap-namemap-shapeprimaryqualifiertaxonomic-id
DTD v3.0Attributes
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Books & Journals in PubMed Central
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Source Conversion
(1) Third-party vendor services: Tagging rules for journals can be applied to book content, especially, for lower level document objects.
CitationsFiguresTables
(2) In-house conversion: For content submitted in external DTDs, code reuse of PMC journal modules for handling:
DatesStringsCALS to XHTML table conversion
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Data Processing and Ingest
Software to lookup PubMed IDs in citations<pub-id pub-id-type=”pmid”>
Imaging resizing software and validation checks for graphics and supplementary data files such as PDF
Loading code for the extraction of key information, such as dates, subject categories, etc
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Output Formats
HTML
Uses base XSLT Article rendering rules for conversion of XML to HTML; book-specific overwrites or modifications
Uses XSL-FO base code for articles; book-specific overwrites or modifications
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Advantages of using a Shared Tag Set
Share XSLT modules during ingest, conversion processes, and renderingUse similar database infrastructureEnables closer integration for a variety of processes, such as PubMed submission and indexing
Submission of Content to Bookshelf
• PDF or Word• XML in NLM Book DTD• XML in external DTDs• Word authoring followed by conversion to XML (in-
house)
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<book>
Submitted Files
PDFWord
XML (External DTD)
NLM Book DTD XML
Third-party vendoror
In-house Converters
Requirements
Pass validation Pass stylecheck
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<book-part>
PMC
<book-part><book-part><book-part>
CMS
<book>CHOP-IT-UP
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
NCBI Word converter
XML
Instant HTML Preview
Publish to
Bookshelf
Microsoft Worddocument
Word Authoring Followed by Conversion to XML
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Stylechecker
Check business rules
Goal: one set of rendering rules for uniform source XML data
2 Checkpoints
Whole book (modified article stylechecker)
Individual book-part (article stylechecker)
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
Integrating Content from Different Databases
Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010
<!DOCTYPE sec SYSTEM "book.dtd"><sec><title/>
<sec id="molgen.tables" ><title/><p content-type="molecular_genetics"><italic>Information in the Molecular Genetics and OMIM tables may differ from that elsewhere in the GeneReview: tables may containmore recent information. —</italic>ED.</p><table-wrap id="pkd-ar.molgen.TA" position="anchor"><caption><p>Table A. Polycystic Kidney Disease, Autosomal Recessive: Genes and Databases</p></caption><table><tbody><tr><th>Gene Symbol</th><th>Chromosomal Locus</th><th>Protein Name</th><th>Locus Specific</th><th>HGMD</th></tr>
Data in the JATS Book DTD Delivered from External Database
<?get-external-xml molgen.tables?>
Processing Instruction in Source XML