reality check: what to expect from an automated conversion to ebook

34
Reality Check What to expect from automated conversion to eBook Mark Gross

Upload: intelligent-content-conference

Post on 07-Nov-2014

891 views

Category:

Technology


2 download

DESCRIPTION

This presentation reviews what an automated approach can (and can’t) do, issues that are best dealt with preconversion, and issues that are best dealt with postconversion. It covers some of the specific problems encountered when converting to EPUB & MOBI from different source types; the limitations of automated conversion as well as a suggested approach; the difference between EPUB & MOBI and their supported devices; and important things to keep in mind for special content. Learn about the kinds of things that should be considered in advance, and the kinds of preparations you can make in order to manage the changeover process easier with no surprises.

TRANSCRIPT

June 25, 2008

Reality Check

What to expect from

automated conversion to eBook

Mark Gross

2

About this Presentation

• Recent eBook survey results

• A very quick intro to eBooks

• Conversions from HTML & PDF

• Limitations of automated conversions

• A suggested approach

• Things to keep in mind with special content

3

About Us

• Providing publishing and XML-related services for 30 years, successfully

converting over a billion pages

• Privately held woman-owned small business headquartered in New York City

• Expertise in large complex conversion projects

• Substantial experience in managing multiple vendors for large-scale projects,

with automated tracking and reporting of data throughout

• Sophisticated quality control workflow with both automated and

human quality control steps to guarantee accuracy

• Publish a monthly newsletter devoted to SGML/XML and

Electronic Publishing topics with a subscriber base of over 7,000

• Wrote the data conversion chapters in The XML Handbook and

the Columbia guide to Digital Publishing

4

Highlights From Our Recent eBook Survey

• Majority (63%) said the next book they publish will be an eBook

• Accuracy is the top issue, rather than cost and turnaround time

• Not just novels - 75% are planning eBooks for complex books

• iPad and kindle users lead (44% & 36%), with others far behind

• Most want their books to work on everything – ePub, kindle, and more

• Most respondents (65%) are currently earning money from eBooks

5

Very Quick Introduction to eBooks

• ePub is the emerging standard used for most eReaders

• Mobi is also a large player, but proprietary to Amazon Kindle

• ePub is evolving

• ePub is supported differently by different eReaders

• eBooks are publications and need care in their production

• There are no “Silver Bullets”

6

Things to Keep in Mind When Converting from HTML

• Smaller screen size

• Large tables may not fit

• Not all Character Sets supported by all devices

• MathML not supported very often

7

Some Things to Keep in Mind When Converting from PDF

• Page layout concept

• More than one column

• Index – is linking

necessary?

• Objects mid-paragraph

8

Converting directly may lead

to problems …

Handling of Objects Mid-Paragraph

9

What Happens in an Automated PDF Conversion

10

Source Document

11

Product #1 Automated Conversion Output

• Chapter header found mid paragraph

• Multiple links to the same chapter heading

• Emphasis not retained

• Paragraph breaks do not match source

• Lots of extraneous data

12

• Footnote Linking Character captured as plain text

• Indented formatting not retained

• Missing random characters "ex" vs. "exact”

• Emphasis not retained

Product #2 Automated Conversion Output

13

• Extra spaces around punctuation • Missing spaces between words

Product #3 Automated Conversion Output

14

• PDF repeating header captured as plain text repeatedly

• Merged paragraphs

• Unnecessary hyphens

Product #3 (cont’d)

15

Approach to Converting PDF to an eBook

1 2 3

4 5 6

Log and

Review

Materials

Zoning and

Text

Extraction

Image

Cropping

Proofreading /

Clean-up

Styling /

Pre-Tagging

Convert to

HTML

9 8 7

Validate ePub

Creation

Edit CSS

Based on Look

of Source

10 11

Final Delivery

12

Final Quality

Control View

16

Intermediary Word Document (after pre-tagging and cleanup)

17

Final ePub Output

18

Tools for ePub Validation

ePubCheck – validates against ePub standard

code.google.com/p/epubcheck

ePubPreflight – checks for device-specific issues

threepress.org/document/epub-validate

19

Things to Keep in Mind with Special Content

20

Math as Images – Changing Font Size Doesn’t Change Images

21

Unicode Symbols Will Adjust with the Font Size Change

Large Tables

Table as Text (searchable but cut off) Table as Image

22

23

When Layout Matters

Testing Materials Poetry

Letter Recipe

24

When Layout Matters (cont’d)

25

Things to Keep in Mind when Converting for Kindle

26

Some Notes on the Kindle

Traditional Kindle

• Designed for reading long documents

• Designed for simplicity

• Has some features that others don’t

• But also missing some features that others have

• Therefore, need to design the conversion differently

Kindle Fire

• Supports the KF8 format, allowing for more styling, the

Float CSS style, Drop caps and some HTML5 tagging

• However, new features are not backwards compatible

27

Glossary Definitions

iPad screenshot Kindle screenshot

28

Use of CSS “Float” Style

iPad screenshot Kindle screenshot

29

Use of Borders

iPad screenshot Kindle screenshot

Color/Spanning/Large Tables

iPad screenshot Kindle screenshot

30

31

Kindle for PC

Actual Kindle Device

Importance of Viewing on the Actual Device

32

Kindle for PC

Actual Kindle Device

Importance of Viewing on the Actual Device (cont’d)

33

What We Learned

• For most materials, automated conversion isn’t ready for

primetime

• Since different devices render differently, multiple outputs are

recommended

• Special content requires special attention

• Review your converted content on it’s intended device

• It’s your book – it’s worth the effort to make it come out right!

34 34

Questions...

& Answers

Data Conversion Laboratory

61-18 190th St., 2nd Floor

Fresh Meadows, NY 11365

Telephone: (718) 357-8700

Fax: (718) 357-8776

Web: http://www.dclab.com

Mark Gross, President

[email protected]

718-307-5711