2
About this Presentation
• Recent eBook survey results
• A very quick intro to eBooks
• Conversions from HTML & PDF
• Limitations of automated conversions
• A suggested approach
• Things to keep in mind with special content
3
About Us
• Providing publishing and XML-related services for 30 years, successfully
converting over a billion pages
• Privately held woman-owned small business headquartered in New York City
• Expertise in large complex conversion projects
• Substantial experience in managing multiple vendors for large-scale projects,
with automated tracking and reporting of data throughout
• Sophisticated quality control workflow with both automated and
human quality control steps to guarantee accuracy
• Publish a monthly newsletter devoted to SGML/XML and
Electronic Publishing topics with a subscriber base of over 7,000
• Wrote the data conversion chapters in The XML Handbook and
the Columbia guide to Digital Publishing
4
Highlights From Our Recent eBook Survey
• Majority (63%) said the next book they publish will be an eBook
• Accuracy is the top issue, rather than cost and turnaround time
• Not just novels - 75% are planning eBooks for complex books
• iPad and kindle users lead (44% & 36%), with others far behind
• Most want their books to work on everything – ePub, kindle, and more
• Most respondents (65%) are currently earning money from eBooks
5
Very Quick Introduction to eBooks
• ePub is the emerging standard used for most eReaders
• Mobi is also a large player, but proprietary to Amazon Kindle
• ePub is evolving
• ePub is supported differently by different eReaders
• eBooks are publications and need care in their production
• There are no “Silver Bullets”
6
Things to Keep in Mind When Converting from HTML
• Smaller screen size
• Large tables may not fit
• Not all Character Sets supported by all devices
• MathML not supported very often
7
Some Things to Keep in Mind When Converting from PDF
• Page layout concept
• More than one column
• Index – is linking
necessary?
• Objects mid-paragraph
•
11
Product #1 Automated Conversion Output
• Chapter header found mid paragraph
• Multiple links to the same chapter heading
• Emphasis not retained
• Paragraph breaks do not match source
• Lots of extraneous data
12
• Footnote Linking Character captured as plain text
• Indented formatting not retained
• Missing random characters "ex" vs. "exact”
• Emphasis not retained
Product #2 Automated Conversion Output
13
• Extra spaces around punctuation • Missing spaces between words
Product #3 Automated Conversion Output
14
• PDF repeating header captured as plain text repeatedly
• Merged paragraphs
• Unnecessary hyphens
Product #3 (cont’d)
15
Approach to Converting PDF to an eBook
1 2 3
4 5 6
Log and
Review
Materials
Zoning and
Text
Extraction
Image
Cropping
Proofreading /
Clean-up
Styling /
Pre-Tagging
Convert to
HTML
9 8 7
Validate ePub
Creation
Edit CSS
Based on Look
of Source
10 11
Final Delivery
12
Final Quality
Control View
18
Tools for ePub Validation
ePubCheck – validates against ePub standard
code.google.com/p/epubcheck
ePubPreflight – checks for device-specific issues
threepress.org/document/epub-validate
26
Some Notes on the Kindle
Traditional Kindle
• Designed for reading long documents
• Designed for simplicity
• Has some features that others don’t
• But also missing some features that others have
• Therefore, need to design the conversion differently
Kindle Fire
• Supports the KF8 format, allowing for more styling, the
Float CSS style, Drop caps and some HTML5 tagging
• However, new features are not backwards compatible
33
What We Learned
• For most materials, automated conversion isn’t ready for
primetime
• Since different devices render differently, multiple outputs are
recommended
• Special content requires special attention
• Review your converted content on it’s intended device
• It’s your book – it’s worth the effort to make it come out right!
34 34
Questions...
& Answers
Data Conversion Laboratory
61-18 190th St., 2nd Floor
Fresh Meadows, NY 11365
Telephone: (718) 357-8700
Fax: (718) 357-8776
Web: http://www.dclab.com
Mark Gross, President
718-307-5711