reality check: what to expect from an automated conversion to ebook
DESCRIPTION
This presentation reviews what an automated approach can (and can’t) do, issues that are best dealt with preconversion, and issues that are best dealt with postconversion. It covers some of the specific problems encountered when converting to EPUB & MOBI from different source types; the limitations of automated conversion as well as a suggested approach; the difference between EPUB & MOBI and their supported devices; and important things to keep in mind for special content. Learn about the kinds of things that should be considered in advance, and the kinds of preparations you can make in order to manage the changeover process easier with no surprises.TRANSCRIPT
2
About this Presentation
• Recent eBook survey results
• A very quick intro to eBooks
• Conversions from HTML & PDF
• Limitations of automated conversions
• A suggested approach
• Things to keep in mind with special content
3
About Us
• Providing publishing and XML-related services for 30 years, successfully
converting over a billion pages
• Privately held woman-owned small business headquartered in New York City
• Expertise in large complex conversion projects
• Substantial experience in managing multiple vendors for large-scale projects,
with automated tracking and reporting of data throughout
• Sophisticated quality control workflow with both automated and
human quality control steps to guarantee accuracy
• Publish a monthly newsletter devoted to SGML/XML and
Electronic Publishing topics with a subscriber base of over 7,000
• Wrote the data conversion chapters in The XML Handbook and
the Columbia guide to Digital Publishing
4
Highlights From Our Recent eBook Survey
• Majority (63%) said the next book they publish will be an eBook
• Accuracy is the top issue, rather than cost and turnaround time
• Not just novels - 75% are planning eBooks for complex books
• iPad and kindle users lead (44% & 36%), with others far behind
• Most want their books to work on everything – ePub, kindle, and more
• Most respondents (65%) are currently earning money from eBooks
5
Very Quick Introduction to eBooks
• ePub is the emerging standard used for most eReaders
• Mobi is also a large player, but proprietary to Amazon Kindle
• ePub is evolving
• ePub is supported differently by different eReaders
• eBooks are publications and need care in their production
• There are no “Silver Bullets”
6
Things to Keep in Mind When Converting from HTML
• Smaller screen size
• Large tables may not fit
• Not all Character Sets supported by all devices
• MathML not supported very often
7
Some Things to Keep in Mind When Converting from PDF
• Page layout concept
• More than one column
• Index – is linking
necessary?
• Objects mid-paragraph
•
11
Product #1 Automated Conversion Output
• Chapter header found mid paragraph
• Multiple links to the same chapter heading
• Emphasis not retained
• Paragraph breaks do not match source
• Lots of extraneous data
12
• Footnote Linking Character captured as plain text
• Indented formatting not retained
• Missing random characters "ex" vs. "exact”
• Emphasis not retained
Product #2 Automated Conversion Output
13
• Extra spaces around punctuation • Missing spaces between words
Product #3 Automated Conversion Output
14
• PDF repeating header captured as plain text repeatedly
• Merged paragraphs
• Unnecessary hyphens
Product #3 (cont’d)
15
Approach to Converting PDF to an eBook
1 2 3
4 5 6
Log and
Review
Materials
Zoning and
Text
Extraction
Image
Cropping
Proofreading /
Clean-up
Styling /
Pre-Tagging
Convert to
HTML
9 8 7
Validate ePub
Creation
Edit CSS
Based on Look
of Source
10 11
Final Delivery
12
Final Quality
Control View
18
Tools for ePub Validation
ePubCheck – validates against ePub standard
code.google.com/p/epubcheck
ePubPreflight – checks for device-specific issues
threepress.org/document/epub-validate
26
Some Notes on the Kindle
Traditional Kindle
• Designed for reading long documents
• Designed for simplicity
• Has some features that others don’t
• But also missing some features that others have
• Therefore, need to design the conversion differently
Kindle Fire
• Supports the KF8 format, allowing for more styling, the
Float CSS style, Drop caps and some HTML5 tagging
• However, new features are not backwards compatible
33
What We Learned
• For most materials, automated conversion isn’t ready for
primetime
• Since different devices render differently, multiple outputs are
recommended
• Special content requires special attention
• Review your converted content on it’s intended device
• It’s your book – it’s worth the effort to make it come out right!
34 34
Questions...
& Answers
Data Conversion Laboratory
61-18 190th St., 2nd Floor
Fresh Meadows, NY 11365
Telephone: (718) 357-8700
Fax: (718) 357-8776
Web: http://www.dclab.com
Mark Gross, President
718-307-5711