approaches to document/report generation

23
Document Generation Do’s and Don’ts Jason Harrop Plutext Pty Ltd

Upload: plutext

Post on 29-Nov-2014

3.112 views

Category:

Documents


3 download

DESCRIPTION

Presents approaches for programmatically creating Office files. Targeted at developers. Presented at http://osdc.com.au/talks/generating-documents-tools-and-techniques

TRANSCRIPT

Page 1: Approaches to document/report generation

Document GenerationDo’s and Don’ts

Jason HarropPlutext Pty Ltd

Page 2: Approaches to document/report generation

www.docx4java.org

Where I’m coming from…

• docx4j is an ASLv2 library for (Microsoft) Open XML office documents (docx, pptx, xlsx)

• My company Plutext sponsors that project• docx4j started in 2007

Page 3: Approaches to document/report generation

www.docx4java.org

Since its introduction in 2007, docx4j has become quite popular.

Page 4: Approaches to document/report generation

www.docx4java.org

Comparables

tool Open XML SDK docx4j POI Aspose

vendor Microsoft Plutext Apache Aspose

language .NET (C# etc) Java Java Java

cost free free free expensive

open source no yes(ASL v2)

yes(ASL v2)

no

marshalling framework .NET JAXB

(even moXy)XML Beans JAXB

Page 5: Approaches to document/report generation

www.docx4java.org

Page 6: Approaches to document/report generation

www.docx4java.org

Choose your hub format; import/export from/to others

XHTML

PDF

docx?

docx

XHTML

PDF?

• If you need to replicate the appearance of existing Office documents, using the Microsoft formats as your “hub” will avoid lots of pain

• If you can, work with the OpenXML formats, not the legacy binary ones, or Word 2003 XML, or Word HTML

• LibreOffice/OpenOffice is a useful tool for conversion, driven by JODConverter

Page 7: Approaches to document/report generation

www.docx4java.org

Open XML

• standardised via ECMA 376 and ISO/IEC 29500• includes XSD

– can generate strongly typed classes

Open Unzip Alter XMLOpen Unzip Unmarshal Manipulate

objects

Page 8: Approaches to document/report generation

www.docx4java.org

Authoring time Generation time

What skills do authors

need?

data

docx

PDF

HTML

Page 9: Approaches to document/report generation

www.docx4java.org

Approach 1:- Variable replacement.

This approach can also be used for pptx, xlsx

Page 10: Approaches to document/report generation

www.docx4java.org

What could be simpler?

Page 11: Approaches to document/report generation

www.docx4java.org

Ummm… not so fast.

1. spelling/grammar proofing

2. rsid

3. run formatting

Page 12: Approaches to document/report generation

www.docx4java.org

Look for a solution which maintains integrity

• Typically a Word Add-In or macro which ensures integrity• This suggestion applies to approaches #2 and #3 as well

Page 13: Approaches to document/report generation

www.docx4java.org

Additional requirement: repeating data (list items, table rows)

• can be done using some convention, for example:[#list developers as developer] ${developer.name}[/#list]

• many systems invent their own (eg HotDocs)• but freemarker or velocity template language can be used to

do this:– http://freemarker.sourceforge.net/– http://velocity.apache.org/

• for example:– XDocReport (FreeMarker or Velocity; open source)

• (this templating approach can also be used with OpenOffice documents)

Page 15: Approaches to document/report generation

www.docx4java.org

Additional requirement: images

• Now it is starting to get a bit trickier, because inserting an image requires:– adding an image part to the docx package– making a note of its rel id– replacing the placeholder with the image XML, including the rel id

Page 16: Approaches to document/report generation

www.docx4java.org

Approach 2:- MERGEFIELD and other fields

• Fields are a long standing feature of Word, included in the Open XML specification

• so lots of documents use this (aka mail merge)• Various other useful field types eg IF• A partial solution to the integrity problems of Approach 1

Page 17: Approaches to document/report generation

www.docx4java.org

But, two unpleasant XML hybrids (simple and complex)

<w:fldSimple w:instr=" MERGEFIELD name "> <w:r> <w:t>«name»</w:t> </w:r> </w:fldSimple> <w:r>

<w:fldChar w:fldCharType="begin"/>

<w:instrText xml:space="preserve">NAME</w:instrText>

<w:fldChar w:fldCharType="separate"/>

<w:r> <w:t>«name»</w:t> </w:r>

<w:fldChar w:fldCharType="end"/> </w:r>

Page 18: Approaches to document/report generation

www.docx4java.org

Approach 3:- Content controls

Page 19: Approaches to document/report generation

www.docx4java.org

Much nicer XML, and XPath binding

<w:sdt> <w:sdtPr> <w:alias w:val="name"/> <w:tag w:val="od:xpath=ribxv"/> <w:id w:val="13144269"/> <w:dataBinding w:xpath="/oda:answers/oda:answer[@id='name_Wt']" /> </w:sdtPr> <w:sdtContent> <w:r > <w:t>«name»</w:t> </w:r> </w:sdtContent> </w:sdt>

Page 20: Approaches to document/report generation

www.docx4java.org

Content controls are nice

• Better solution integrity wise• Can bind via XPath to arbitrary XML • handles images• since Word 2007• can nest, so repeats/conditions work well

– unlike Approaches 1 & 2– table row friendly

• w:tag supports arbitrary data

.. But unique to Open XML. (Could/should a revised ODF support similar?)

Page 21: Approaches to document/report generation

www.docx4java.org

Repeats/conditions

• applies to content inside• w:dataBinding doesn’t support these• so create your own semantics• OpenDoPE is one way• use w:tag for implementation• need an editing tool to insert repeats/conditions

– for OpenDoPE, there are Word Add-Ins designed for technical and non-technical users

• at generation time, need code to support them– docx4j does this, and other OpenXML libraries could be extended to

support

• can support complex documents (nested repeats etc)

Page 22: Approaches to document/report generation

www.docx4java.org

Choose your poison

• docx4j supports all three approaches– but content controls are strongly recommended

• other libraries offer more or less support for each approach

Page 23: Approaches to document/report generation

www.docx4java.org

Thanks!