bookshelf leafing through xml nlm journal article tag suite conference 2010 martin latterner and...

33
Bookshelf Leafing through XML NLM Journal Article Tag Suite Conference 2010 Martin Latterner and Marilu Hoeppner National Center for Biotechnology Information National Library of Medicine next> <prev

Post on 19-Dec-2015

217 views

Category:

Documents


4 download

TRANSCRIPT

BookshelfLeafing through XML

NLM Journal Article Tag Suite Conference 2010

Martin Latterner and Marilu HoeppnerNational Center for Biotechnology Information

National Library of Medicine

next><prev

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

NLMBOOK

DTDv2.3

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

NLM Collection Catalog

PubMed AbstractsElectronic Literature

Archive

Books, Monographs, Reports

Journals

Other publication formats

Book chapters, Monographs, Reports

Books in PubMed

Non-PubMed Books

User guides, Documentation

Journal articles PMC Journals PubMed Central

Bookshelf

Entrez Literature Resources

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Features of the Book DTDBooks and journals within PubMed CentralBookshelf WorkflowsIntegration of information between databases

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Modifications

Allowed icon as a child of exlnk.Allowed pre as a child of entry.Allowed glossary as a child of chapter.Added type: ppt.Added attributes id and BID to <foot>.Added attribute id to <p>.Added <title>, child of <bibsect>.Added <bb>, <gf> and <figgrp> as children of <linkgrp>.Added <email> as child of <txtstyle>.Added <pdf> as child of <glossary>.Added <figgrp1> as child of <entry>.…

NCBI Book DTD 1.0Based on ISO 12083 Article DTD

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

March 2003v1.0

December 2004v2.0

November 2005v2.1

BOOKSHELF XML DATANCBI BOOK DTD

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Book DTDof the

NLM Journal Article Tag Suite

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Designed to capture the semantic elements of the content, not form

e.g. bibliographic metadata

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<front>

<div type="titlepage" level="1" id="2001902bddd00001"> <booktitle> <ils style="strong">CONFLICT OF INTEREST IN MEDICAL RESEARCH</ils> </booktitle> <bookauthor> <bookauthor.name>Committee on Conflict of Interest in Medical Research</bookauthor.name> <bookauthor.info>Board on Health Sciences Policy</bookauthor.info> <bookauthor.info>INSTITUTE OF MEDICINE <ils style="smallcap"> <ils style="emphasis"> OF THE NATIONAL ACADEMIES</ils> </ils> </bookauthor.info> </bookauthor> <publication.stmt> <p style="center"> <publisher> <publisher.name>THE NATIONAL ACADEMIES PRESS</publisher.name> <publisher.address><state>Washington, D.C.</state></publisher.address> </publisher> </p> </publication.stmt> <page number="ii" id="2001902bppp00002"/> </div>

<div type="copyrightpage" level="1" id="2001902bddd00002"> <publication.stmt> <p style="normal"> <publisher> <publisher.name><ils style="strong">THE NATIONAL ACADEMIES PRESS</ils></publisher.name> <publisher.address> <street><ils style="strong">500 Fifth Street, N.W.</ils></street> <state><ils style="strong">Washington, DC</ils></state> <postcode><ils style="strong">20001</ils></postcode> </publisher.address> </publisher> </p> </publication.stmt> <publication.stmt> <p style="flindent">ISBN <isbn>978-0-309-13188-9</isbn> (hardcover)</p> </publication.stmt> <copyright>Copyright <copyright.year>2009</copyright.year> by the <copyright.holder>National Academy of Sciences</copyright.holder>. All rights reserved.</copyright> <printinfo> <print>Printed in the United States of America</print> </printinfo> </div></front>

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<book-meta> <book-title-group> <book-title>Conflict of Interest in Medical Research</book-title> </book-title-group> <contrib-group> <contrib contrib-type="author"> <collab>Institute of Medicine (US) Committee on Conflict of Interest in Medical Research, Education, and Practice</collab> </contrib> </contrib-group> <publisher> <publisher-name>National Academies Press (US)</publisher-name> <publisher-loc>Washington (DC)</publisher-loc> </publisher> <isbn>978-0-309-13188-9</isbn> <pub-date pub-type="ppub"> <year>2009</year> </pub-date> <permissions> <copyright-statement>Copyright &copy; 2009, National Academy of Sciences</copyright-statement> <copyright-year>2009</copyright-year> </permissions></book-meta>

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

More granular text descriptions are handled at attribute level

e.g. preface, foreword

<sec sec-type=“preface”>

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Article Book

<abbrev-journal-title><article><article-categories><article-id><article-meta><conf-acronym><conference><conf-num><conf-theme><floats-group><front><front-stub><issue-sponsor><journal-meta><journal-subtitle><journal-title><journal-title-group><response><series-text><series-title><string-conf><sub-article><unstructured-kwd-group><x>

<alternate-form><area><book><book-front><book-meta><book-part><book-part-categories><book-part-meta><book-title><book-title-group><collection><collection-id><collection-list><collection-member><collection-meta><collection-name><map><map-group><multi-link>

DTD v3.0Elements

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<map-group>

XML

<map-group id="my-map-id"> <graphic xlink:href="img-uri"/> <map map-name="my-map"> <area map-shape="rect" map-coords="1,1,51,76" xlink:href="uri1"/> <area map-shape="rect" map-coords="54,4,94,74" xlink:href="ur2"/> </map></map-group>

XHTML

<img src="img-uri" usemap="#my-map-id"/><map id="my-map-id" name="my-map"> <area href="uri1" shape="rect" coords="1,1,51,76"/> <area href="uri2" shape="rect" coords="54,4,94,74"/></map>

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<multi-link>

XML

<multi-link> <term>IDDM2</term> <ext-link ext-link-type="url" xlink:href="LINK1">Bookshelf</ext-link> <ext-link ext-link-type="url" xlink:href="LINK2">PubMed Central</ext-link>…</multi-link>

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Article Bookabbrev-typearticle-typeresponse-type

alternate-form-typebook-idbook-part-numberbook-part-typegraphic-type (obsolete)indexedmap-altmap-coordsmap-namemap-shapeprimaryqualifiertaxonomic-id

DTD v3.0Attributes

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Books & Journals in PubMed Central

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Source Conversion

(1) Third-party vendor services: Tagging rules for journals can be applied to book content, especially, for lower level document objects.

CitationsFiguresTables

(2) In-house conversion: For content submitted in external DTDs, code reuse of PMC journal modules for handling:

DatesStringsCALS to XHTML table conversion

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Data Processing and Ingest

Software to lookup PubMed IDs in citations<pub-id pub-id-type=”pmid”>

Imaging resizing software and validation checks for graphics and supplementary data files such as PDF

Loading code for the extraction of key information, such as dates, subject categories, etc

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

CHOP-IT-UP

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Output Formats

HTML

Uses base XSLT Article rendering rules for conversion of XML to HTML; book-specific overwrites or modifications

PDF

Uses XSL-FO base code for articles; book-specific overwrites or modifications

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Advantages of using a Shared Tag Set

Share XSLT modules during ingest, conversion processes, and renderingUse similar database infrastructureEnables closer integration for a variety of processes, such as PubMed submission and indexing

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Bookshelf Workflows

Submission of Content to Bookshelf

• PDF or Word• XML in NLM Book DTD• XML in external DTDs• Word authoring followed by conversion to XML (in-

house)

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<book>

Submitted Files

PDFWord

XML (External DTD)

NLM Book DTD XML

Third-party vendoror

In-house Converters

Requirements

Pass validation Pass stylecheck

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<book-part>

PMC

<book-part><book-part><book-part>

CMS

<book>CHOP-IT-UP

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

NCBI Word converter

XML

Instant HTML Preview

Publish to

Bookshelf

Microsoft Worddocument

Word Authoring Followed by Conversion to XML

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Stylechecker

Check business rules

Goal: one set of rendering rules for uniform source XML data

2 Checkpoints

Whole book (modified article stylechecker)

Individual book-part (article stylechecker)

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Integrating Content from Different Databases

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

<!DOCTYPE sec SYSTEM "book.dtd"><sec><title/>

<sec id="molgen.tables" ><title/><p content-type="molecular_genetics"><italic>Information in the Molecular Genetics and OMIM tables may differ from that elsewhere in the GeneReview: tables may containmore recent information. &#x02014;</italic>ED.</p><table-wrap id="pkd-ar.molgen.TA" position="anchor"><caption><p>Table A. Polycystic Kidney Disease, Autosomal Recessive: Genes and Databases</p></caption><table><tbody><tr><th>Gene Symbol</th><th>Chromosomal Locus</th><th>Protein Name</th><th>Locus Specific</th><th>HGMD</th></tr>

Data in the JATS Book DTD Delivered from External Database

<?get-external-xml molgen.tables?>

Processing Instruction in Source XML

Latterner M and Hoeppner MA. Bookshelf: Leafing through XML. JATS-CON 2010

next><prev