engineering next- generation publishing...

Post on 24-May-2020

22 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Engineering Next-Generation Publishing

Workflows

IDPF Digital Book 2013 May 30, 2013

Sanders Kleinfeld O’Reilly Media, Inc.

How do you write a book?

How do you write a “book”?

How do you write an (e)book?

How do you “write” an (e)book?

Anatomy of an ebook: EPUB What you see <?xml version="1.0" encoding="UTF-8" standalone="no"?>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Chapter 1. A Python Q&amp;A Session</title> <link rel="stylesheet" href="core.css" type="text/css" /> <meta name="generator" content="DocBook XSL Stylesheets V1.74.0" /> </head> <body> <div class="chapter" title="Chapter 1. A Python Q&amp;A Session"> <div class="titlepage”> <div> <div> <h1 class="title"> <a id="a_python_q_ampersand_a_session”></a> Chapter 1. A Python Q&amp;A Session </h1> </div> </div> </div> <p>If you’ve bought this book, you may already know what Python is and why it’s an important tool to learn. If you don’t, you probably won’t be sold on Python until you’ve learned the language by reading the rest of this book and have done a project or two. But before we jump into details, the first few pages of this book will briefly introduce some of the main reasons behind Python’s popularity. To begin sculpting a definition of Python, this chapter takes the form of a question-and-answer session, which poses some of the most common questions asked by beginners.</p>

What’s inside

Ebooks are made of code. If you are an ebook publisher, you are in the software-development

business.

An Inconvenient Truth:

How do you “write” an (e)book?

How do you develop an (e)book?

Five Key Principles of a Modern (e)Book Workflow

#1. Semantic Markup Matters

#2. Single Source, Multiple Outputs

#3. Automate Your Headaches Away

#4. Versioning is the New Spell-Check

#5. Always think “Digital First”

#1 Semantic Markup Matters

First Chapter of My Memoirs

Microsoft Word

Underlying Representation of Content (Word XML)

<w:body><w:p w:rsidR="0073527D" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:jc w:val="right"/><w:rPr><w:sz w:val="96"/><w:szCs w:val="96"/></w:rPr></w:pPr><w:r w:rsidRPr="007F1550”><w:rPr><w:sz w:val="96"/><w:szCs w:val="96"/></w:rPr>!!<w:t>1</w:t>!!</w:r></w:p><w:p w:rsidR="007F1550" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:jc w:val="right"/>!<w:rPr><w:sz w:val="72"/><w:szCs w:val="72"/></w:rPr></w:pPr><w:r w:rsidRPr="007F1550”><w:rPr><w:sz w:val="72"/><w:szCs w:val="72"/></w:rPr>!!<w:t>Autobiography of Me</w:t>!!</w:r></w:p><w:p w:rsidR="007F1550" w:rsidRPr="007F1550" w:rsidRDefault="007F1550" w:rsidP="007F1550">!<w:pPr><w:jc w:val="right"/><w:rPr><w:sz w:val="72"/><w:szCs w:val="72"/></w:rPr></w:pPr></w:p>!<w:p w:rsidR="007F1550" w:rsidRPr="00032659" w:rsidRDefault="007F1550" w:rsidP="007F1550”><w:pPr><w:rPr>!<w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr></w:pPr><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/>!<w:szCs w:val="48"/></w:rPr>! !<w:t xml:space="preserve">I was born in 1980, I love chocolate ice cream, and I am a </w:t>!!</w:r><w:r w:rsidRPr="00032659”><w:rPr><w:i/><w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr>!!<w:t>wicked awesome</w:t>!!</w:r><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr>!!<w:t xml:space="preserve"> writer, </w:t></w:r>!!<w:proofErr w:type="spellStart"/><w:r w:rsidRPr="00032659”><w:rPr><w:sz w:val="48"/><w:szCs w:val="48"/></w:rPr>! !<w:t>yo</w:t>!!</w:r><w:proofErr w:type="spellEnd"/>!…!

Three Problems with this XML

•  Markup is not semantic!

•  It conflates content and presentation

•  Um, yuck L

Semantic Markup in a Nutshell

Semantic markup describes the function of your content, not its formatting SEMANTIC MARKUP SAYS: “This is a section heading” NOT: “This text is in Garamond, 36 pt, bold, center-aligned”

Semantic Markup Option #1: DocBook

•  DocBook is a semantic XML markup vocabulary introduced in 1991

•  It was primarily designed for representing technical documentation, but is well-suited for representing any prose content

•  DocBook DTDs are available here: http://www.oasis-open.org/docbook/xml/

DocBook Representation of Book Content

<?xml version="1.0" encoding="utf-8"?>!<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">!

<chapter>! <title>Autobiography of Me</title>! <para>I was born in 1980, I love chocolate ice cream, and I am a <emphasis>wicked awesome</emphasis> writer, yo!</para>!</chapter>!

Text Editors with GUI DocBook Support

XMLmind XML Editor (http://www.xmlmind.com/xmleditor/)

Oxygen XML Editor (http://www.oxygenxml.com/)

Semantic Markup Option #2: AsciiDoc

•  AsciiDoc is a lightweight, wiki-like markup language for prose content

•  It was created by Stuart Rackham in 2002.

•  The AsciiDoc toolchain is written in Python, and relies heavily on text processing with regular expressions.

AsciiDoc Representation of Book Content

== Autobiography of Me!!I was born in 1980, I love chocolate ice cream, and I am a _wicked awesome_ writer, yo!!

Text Editor with AsciiDoc Support

O’Reilly Atlas

Semantic Markup Option #3: HTML

“Say what? HTML?”

Ebooks are composed of HTML…

So, why not write them in HTML?

HTML5 = New Structural Semantics

•  <article> •  <aside> •  <header> •  <figure> •  <footer> •  <nav> •  <section>

But eBooks require a richer content model!!!

•  More robust semantics for book-specific elements—e.g, chapter, appendix, glossary

•  Explicit, enforceable rules for structure—e.g, no <h1>s lower in the hierarchy than <h2>s

Introducing the HTMLBook Project: http://github.com/oreillymedia/HTMLBook

“That’s nice, but what’s in it for me if I develop my (e)book

in DocBook or AsciiDoc or HTML?”

#2 Single Source, Multiple Outputs

Welcome to Conversion City

Enjoy Your Stay!

Conversion! Conver

sion!

Conversion!

The Single-Source Model

XML or HTML

Advantages of the Single-Source Model:

•  All authoring/edits are made to just one set of files. No need to maintain multiple sets of files.

•  Outputs are produced by transforms, not conversions.

•  Transforms are automated, fast, infinitely repeatable, and do not require cleanup afterward.

•  The model is extensible. Add new output formats by adding a new transform. Workflow doesn’t need to be reinvented.

ASC/DB Single-Source Workflow:

AsciiDoc

DocBook XML

asciidoc.py

DocBook XSL EPUB Stylesheets + Custom CSS

EPUB

DocBook XSL HTML5 Stylesheets HTML5

Print PDF Web PDF

AntennaHouse + Print CSS3

AntennaHouse + Web CSS3

EPUB

DocBook XSL EPUB Stylesheets

Custom XSL for EPUB postprocessing + KF8/Mobi7 CSS Mobi-ready EPUB

Kindlegen

Mobi (KF8) Source Content

Intermediate Output

Final Output For Sale

(optional; can start with DocBook)

HTML5 Single-Source Workflow:

HTML5

EPUB Print PDF Web PDF

AntennaHouse + Print CSS3

AntennaHouse + Web CSS3

EPUB

Custom XSL for EPUB postprocessing + KF8/Mobi7 CSS Mobi-ready EPUB

Kindlegen

Mobi (KF8)

Source Content

Intermediate Output

Final Output For Sale

Packaging XSL + CSS

Packaging XSL + CSS

O’Reilly Atlas Ebook Build UI

#1. Pick ebook formats to build

#2. Pick content files to build

#3. Click “Build”

#3 Automate Your Headaches Away

1776:

http://commons.wikimedia.org/wiki/File:Quill_(PSF).svg!

2012: Manuscript edits

cannot be automated Manuscript edits can be automated

http://www.flickr.com/photos/asurroca/3699873444/!Some rights reserved by ASurroca!

Tools for Scripting Word Documents

•  Macros •  Visual Basic for Applications (VBA) •  PowerShell

Tools for Scripting Plaintext (AsciiDoc/XML) Documents

•  Ruby •  Python •  Perl •  Java •  XPath/XSLT/XQuery •  JavaScript •  Regex •  Emacs/vi •  sed •  And many more…

Fix My Manuscript with One Line of Code!

Request #1: “In the important scientific article below, please change all superscripts to subscripts, except in informal equation elements”

<chapter id="chap1">!!<title>Makin’ Water and Energy</title>!!<para>Makin’ water is really easy. The formula is !H<superscript>2</superscript>O, so you just take some H<superscript>2</superscript>, and add some O.</para>!!<para>Also, here’s how you make energy (per Einstein):</para>!!<informalequation>!<mathphrase>!E = mc<superscript>2</superscript>!</mathphrase>!</informalequation>!</chapter>!

DocBook XML Manuscript:

PDF Output:

Fix My Manuscript with One Line of Code!

Solution #1: XPath to the rescue!

<chapter id="chap1">!!<title>Makin’ Water and Energy</title>!!<para>Makin’ water is really easy. The formula is !H<subscript>2</subscript>O, so you just take some H<subscript>2</subscript>, and add some O.</para>!!<para>Also, here’s how you make energy (per Einstein):</para>!!<informalequation>!<mathphrase>!E = mc<superscript>2</superscript>!</mathphrase>!</informalequation>!</chapter>!

Revised DocBook Manuscript:

PDF Output:

$ xmlstarlet ed -r "//superscript[not(ancestor::informalequation)]" -v "subscript" book.xml!!

XML command

Make an edit

r = rename

Select superscripts…

…that are not….

…inside…

…informal equations.

v = replacement value

Replace with subscripts.

Do all this on book.xml

Fix My Manuscript with One Line of Code!

Request #2: “House style for dates is YYYY-MM-DD Can you please fix in manuscript below?”

AsciiDoc Manuscript:

PDF Output:

== Kindergarten Lemonade Sales!!.Lemonade sales by Kindergarten Lemonade, LLC![options="header"]!|================!|Date|Lemonade Sold|!|3/15/12|6 glasses|!|4/22/10|10 glasses|!|5/31/12|2 glasses|!|7/14/11|4 glasses|!|8/19/12|1 glass|!|9/24/12|432 glasses|!|================!

Fix My Manuscript with One Line of Code!

Solution #2: Regex FTW!

AsciiDoc Manuscript:

PDF Output:

== Kindergarten Lemonade Sales!!.Lemonade sales by Kindergarten Lemonade, LLC![options="header"]!|================!|Date|Lemonade Sold|!|2012-03-15|6 glasses|!|2010-04-22|10 glasses|!|2012-05-31|2 glasses|!|2011-07-14|4 glasses|!|2012-08-19|1 glass|!|2012-09-24|432 glasses|!|================!

$ perl -p -e 's#^(.*)([1-9])/([0-9]{2})/([0-9]{4})(.*)$#$1$4-0$2-$3$5#g' book.asc!

Perl script!

Print each line…

Run the following regex

Capture the following pattern: Char

s before date

Digits in month

Digits in day

Digits in year

Chars after date

Specify replacement pattern:

Chars before date

Year

Month

Day

Chars after date

Perform on this file

#4 Versioning is the New Spell-Check

Two Questions About Your (e)Book’s Editorial Lifecycle

1. Will more than one person be working on the manuscript files?

2. Will there be more than one draft of the manuscript?

If you answered yes to either question, you need a version-

control system.

Key Feature #1 of Version Control: Revision Snapshots

Key Feature #2 of Version Control: Diffing

What if we versioned

manuscripts like software developers

version code?

Revision snapshots in GitHub

Pro Git: https://github.com/progit/progit

Diffing in GitHub

(English to Portuguese translation)

#5 Always Think “Digital First”

There is a difference between a digitized text and a digital

text

Digitized Text = Digital Last “Let’s make a print book and

then get it converted to an ebook.”

Digital Text = Digital First “Let’s make an ebook.”

What Does Digital First Look Like?

Welcome to O’Reilly Labs http://chimera.labs.oreilly.com/

Interactive examples!

Welcome to O’Reilly Labs http://chimera.labs.oreilly.com/

Inline Commenting!

Welcome to O’Reilly Labs http://chimera.labs.oreilly.com/

Integrated Multimedia!

Contact Me! Email: sanders@oreilly.com

Twitter: @sandersk

top related