power builder

Introduction

The OpenDocument Format (ODF) is an XML-based file format for representing electronic documents such as spreadsheets, charts, presentations and word processing documents. The standard was developed by the OASIS (Organization for the Advancement of Structured Information Standards), and it is a free and open format.

The OpenDocument format is used in free software and in proprietary software. Originally, the format was implemented by the OpenOffice.org office suite and, with Office 2007 SP2, Microsoft also supports ODF subset.

This article will explain the basics of ODF format, and specifically its implementation in spreadsheet applications (OpenOffice.org Calc and Microsoft Office Excel 2007 SP2). Presented is a demo application which writes/reads tabular data to/from .ods files. The application is written in C# using Visual Studio 2010. Created .ods files can be opened using Excel 2007 SP2 or greater and OpenOffice.org Calc.

ODF Format

OpenDocument format supports document representation:

As a single XML document As a collection of several subdocuments within a package

Office applications use the second approach, so we will explain in detail.

Every ODF file is a collection of several subdocuments within a package (ZIP file), each of which stores part of the complete document. Each subdocument stores a particular aspect of the document. For example, one subdocument contains the style information and another subdocument contains the content of the document.

This approach has the following benefits:

You don't need to process the entire file in order to extract specific data. Images and multimedia are now encoded in native format, not as text streams. Files are smaller as a result of compression and native multimedia storage.

There are four subdocuments in the package that contain file's data:

content.xml - Document content and automatic styles used in the content styles.xml - Styles used in the document content and automatic styles used in the styles

themselves meta.xml - Document meta information, such as the author or the time of the last save

action

settings.xml - Application-specific settings, such as the window size or printer information

Besides them, in the package, there can be many other subdocuments like document thumbnail, images, etc.

In order to read the data from an ODF file, you need to:

1. Open package as a ZIP archive 2. Find parts that contain data you want to read 3. Read parts you are interested in

On the other side, if you want to create a new ODF file, you need to:

1. Create/get all necessary parts 2. Package everything into a ZIP file with appropriate extension

Spreadsheet Documents

Spreadsheet document files are the subset of ODF files. Spreadsheet files have .ods file extensions.

The content (sheets) is stored in content.xml subdocument.

Picture 1: content.xml subdocument.

As we can see in Picture 1, sheets are stored as XML elements. They contain column and row definitions, rows contain cells and so on... In the picture is data from one specific document, but from this we can see the basic structure of content.xml file (you can also download the full ODF specification).

Implementation

Our demo is Windows Presentation Foundation application (picture 2) written in C# using Visual Studio 2010.

Picture 2: Demo application.

The application can:

create a new Spreadsheet document. read an existing Spreadsheet document. write a created Spreadsheet document.

Creating New Document and Underlying Model of Application

Internally, spreadsheet document is stored as DataSet. Each sheet is represented with DataTable, sheet's row with DataRow, and sheet's column with DataColumn. So, to create a new

document, we have to create a new DataSet, with DataTables. Each DataTable has a number of rows and columns that conforms to our needs.

To show data from our DataSet (and to allow editing that data) the application dynamically creates tabs with DataGridViews (that are connected to our DataTables).

Through the interface, a user can read, write, edit data and add new rows to the Spreadsheet document.

As application, basically, transforms Spreadsheet document to / from DataSet, it can also be used as a reference for Excel to DataSet export / import scenarios.

Zip Component and XML Parser

Although classes from System.IO.Packaging namespace (.NET 3.0) provide a way to read and write ZIP files, they require a different format of ZIP file. Because of that, our demo uses the open source component called DotNetZip.

Using ZIP component we can extract files, get subdocument, replace (or add) subdocuments that we want and save that file as .ods file (which is a ZIP file).

For processing documents, we have used XmlDocument because it offers an easy way to reach parts that we want. Note that, if performance is crucial for you, you should use XmlTextReader and XmlTextWriter. That solution needs more work (and code), but provides better performance.

Reading Spreadsheet Document

To read a document, we follow these steps:

1. Extract .ods file 2. Get content.xml file (which contains sheets data) 3. Create XmlDocument object from content.xml file4. Create DataSet (that represent Spreadsheet file)5. With XmlDocument, we select "table:table" elements, and then we create adequate

DataTables

6. We parse children of "table:table" element and fill DataTables with those data7. At the end, we return DataSet and show it in the application's interface

Although ODF specification provides a way to specify default row, column and cell style, implementations have nasty practice (that specially apply for Excel) that they rather write sheet as sheet with maximum number of columns and maximum number of rows, and then they write all cells with their style. So you could see that your sheet has more than 1000 columns (1024 in Calc and 16384 in Excel), and even more rows (and each rows contains the number of cells that

are equal to the number of columns), although you only have to write data to the first few rows/columns.

ODF specification provides a way that you specify some element (like column/row/cell) and then you specify the number of times it repeats. So the above behavior doesn't affect the size of the file, but that complicates our implementation.

Because of that, we can't just read the number of columns and add an equal number of DataColumns to DataTable (because of performance issues). In this implementation, we rather read cells and, if they have data, we first create rows/columns they belong to, and then we add those cells to the DataTable. So, at the end, we allocate only space that we need to.

Writing Spreadsheet Document

To write a document, we follow these steps:

1. Extract template.ods file (.ods file that we use as template)2. Get content.xml file3. Create XmlDocument object from content.xml file4. Erase all "table:table" elements from the content.xml file 5. Read data from our DataSet and composing adequate "table:table" elements 6. Add "table:table" elements to content.xml file 7. Zip that file as new .ods file.

In this application, as template, we have to use an empty document. But the application can be easily modified to use some other template (so that you have preserved styles, etc.).

Download Links

You can download the latest version of the demo application (together with the C# source code) from here.

Alternative Ways

As always in programming, there is more than one method to achieve the same thing.

ODF files are just a collection of XML files, packed in zip files so, any of the vast number of tools for handling zip files and XML data can be used to handle OpenDocument.

As another option, you could use some third party Excel C# / VB.NET component which has support for ODF format. This will probably cost you some money but has an advantage that usually more than one format (for example: GemBox.Spreadsheet reads/writes XLS, XLSX, CSV, HTML and ODS) is supported within the same API, so your application will be able to target different file formats using the same code.

History

24th July, 2009: Initial post 28th July, 2011: Project converted from Visual Studio 2008 Windows Forms project to

Visual Studio 2010 WPF project using System; using System.Data; using System.Globalization; using System.IO; using System.Reflection; using System.Xml; using Ionic.Zip; namespace OdsReadWrite { internal sealed class OdsReaderWriter { // Namespaces. We need this to initialize XmlNamespaceManager so

that we can search XmlDocument. private static string[,] namespaces = new string[,] { {"table", "urn:oasis:names:tc:opendocument:xmlns:table:1.0"}, {"office", "urn:oasis:names:tc:opendocument:xmlns:office:1.0"}, {"style", "urn:oasis:names:tc:opendocument:xmlns:style:1.0"}, {"text", "urn:oasis:names:tc:opendocument:xmlns:text:1.0"}, {"draw", "urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"}, {"fo", "urn:oasis:names:tc:opendocument:xmlns:xsl-fo-

compatible:1.0"}, {"dc", "http://purl.org/dc/elements/1.1/"}, {"meta", "urn:oasis:names:tc:opendocument:xmlns:meta:1.0"}, {"number",

"urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"}, {"presentation",

"urn:oasis:names:tc:opendocument:xmlns:presentation:1.0"}, {"svg", "urn:oasis:names:tc:opendocument:xmlns:svg-

compatible:1.0"}, {"chart", "urn:oasis:names:tc:opendocument:xmlns:chart:1.0"}, {"dr3d", "urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"}, {"math", "http://www.w3.org/1998/Math/MathML"}, {"form", "urn:oasis:names:tc:opendocument:xmlns:form:1.0"}, {"script", "urn:oasis:names:tc:opendocument:xmlns:script:1.0"}, {"ooo", "http://openoffice.org/2004/office"}, {"ooow", "http://openoffice.org/2004/writer"}, {"oooc", "http://openoffice.org/2004/calc"}, {"dom", "http://www.w3.org/2001/xml-events"}, {"xforms", "http://www.w3.org/2002/xforms"}, {"xsd", "http://www.w3.org/2001/XMLSchema"}, {"xsi", "http://www.w3.org/2001/XMLSchema-instance"}, {"rpt", "http://openoffice.org/2005/report"}, {"of", "urn:oasis:names:tc:opendocument:xmlns:of:1.2"}, {"rdfa", "http://docs.oasis-open.org/opendocument/meta/rdfa#"}, {"config", "urn:oasis:names:tc:opendocument:xmlns:config:1.0"} };

// Read zip stream (.ods file is zip file). private ZipFile GetZipFile(Stream stream) { return ZipFile.Read(stream); } // Read zip file (.ods file is zip file). private ZipFile GetZipFile(string inputFilePath) { return ZipFile.Read(inputFilePath); } private XmlDocument GetContentXmlFile(ZipFile zipFile) { // Get file(in zip archive) that contains data ("content.xml"). ZipEntry contentZipEntry = zipFile["content.xml"]; // Extract that file to MemoryStream. Stream contentStream = new MemoryStream(); contentZipEntry.Extract(contentStream); contentStream.Seek(0, SeekOrigin.Begin); // Create XmlDocument from MemoryStream (MemoryStream

contains content.xml). XmlDocument contentXml = new XmlDocument(); contentXml.Load(contentStream); return contentXml; } private XmlNamespaceManager

InitializeXmlNamespaceManager(XmlDocument xmlDocument) { XmlNamespaceManager nmsManager = new

XmlNamespaceManager(xmlDocument.NameTable); for (int i = 0; i < namespaces.GetLength(0); i++) nmsManager.AddNamespace(namespaces[i, 0], namespaces[i,

1]); return nmsManager; } /// <summary> /// Read .ods file and store it in DataSet. /// </summary> /// <param name="inputFilePath">Path to the .ods file.</param> /// <returns>DataSet that represents .ods file.</returns> public DataSet ReadOdsFile(string inputFilePath) { ZipFile odsZipFile = this.GetZipFile(inputFilePath); // Get content.xml file XmlDocument contentXml = this.GetContentXmlFile(odsZipFile);

// Initialize XmlNamespaceManager XmlNamespaceManager nmsManager =

this.InitializeXmlNamespaceManager(contentXml); DataSet odsFile = new DataSet(Path.GetFileName(inputFilePath)); foreach (XmlNode tableNode in this.GetTableNodes(contentXml,

nmsManager)) odsFile.Tables.Add(this.GetSheet(tableNode, nmsManager)); return odsFile; } // In ODF sheet is stored in table:table node private XmlNodeList GetTableNodes(XmlDocument

contentXmlDocument, XmlNamespaceManager nmsManager) { return contentXmlDocument.SelectNodes("/office:document-

content/office:body/office:spreadsheet/table:table", nmsManager); } private DataTable GetSheet(XmlNode tableNode,

XmlNamespaceManager nmsManager) { DataTable sheet = new

DataTable(tableNode.Attributes["table:name"].Value); XmlNodeList rowNodes = tableNode.SelectNodes("table:table-row",

nmsManager); int rowIndex = 0; foreach (XmlNode rowNode in rowNodes) this.GetRow(rowNode, sheet, nmsManager, ref rowIndex); return sheet; } private void GetRow(XmlNode rowNode, DataTable sheet,

XmlNamespaceManager nmsManager, ref int rowIndex) { XmlAttribute rowsRepeated = rowNode.Attributes["table:number-

rows-repeated"]; if (rowsRepeated == null || Convert.ToInt32(rowsRepeated.Value,

CultureInfo.InvariantCulture) == 1) { while (sheet.Rows.Count < rowIndex) sheet.Rows.Add(sheet.NewRow()); DataRow row = sheet.NewRow(); XmlNodeList cellNodes = rowNode.SelectNodes("table:table-cell",

nmsManager); int cellIndex = 0; foreach (XmlNode cellNode in cellNodes)

this.GetCell(cellNode, row, nmsManager, ref cellIndex); sheet.Rows.Add(row); rowIndex++; } else { rowIndex += Convert.ToInt32(rowsRepeated.Value,

CultureInfo.InvariantCulture); } // sheet must have at least one cell if (sheet.Rows.Count == 0) { sheet.Rows.Add(sheet.NewRow()); sheet.Columns.Add(); } } private void GetCell(XmlNode cellNode, DataRow row,

XmlNamespaceManager nmsManager, ref int cellIndex) { XmlAttribute cellRepeated = cellNode.Attributes["table:number-

columns-repeated"]; if (cellRepeated == null) { DataTable sheet = row.Table; while (sheet.Columns.Count <= cellIndex) sheet.Columns.Add(); row[cellIndex] = this.ReadCellValue(cellNode); cellIndex++; } else { cellIndex += Convert.ToInt32(cellRepeated.Value,

CultureInfo.InvariantCulture); } } private string ReadCellValue(XmlNode cell) { XmlAttribute cellVal = cell.Attributes["office:value"]; if (cellVal == null) return String.IsNullOrEmpty(cell.InnerText) ? null : cell.InnerText; else return cellVal.Value; } /// <summary>

/// Writes DataSet as .ods file. /// </summary> /// <param name="odsFile">DataSet that represent .ods file.</param> /// <param name="outputFilePath">The name of the file to save

to.</param> public void WriteOdsFile(DataSet odsFile, string outputFilePath) { ZipFile templateFile =

this.GetZipFile(Assembly.GetExecutingAssembly().GetManifestResourceStream("OdsReadWrite.template.ods"));

XmlDocument contentXml = this.GetContentXmlFile(templateFile); XmlNamespaceManager nmsManager =

this.InitializeXmlNamespaceManager(contentXml); XmlNode sheetsRootNode =

this.GetSheetsRootNodeAndRemoveChildrens(contentXml, nmsManager); foreach (DataTable sheet in odsFile.Tables) this.SaveSheet(sheet, sheetsRootNode); this.SaveContentXml(templateFile, contentXml); templateFile.Save(outputFilePath); } private XmlNode

GetSheetsRootNodeAndRemoveChildrens(XmlDocument contentXml, XmlNamespaceManager nmsManager)

{ XmlNodeList tableNodes = this.GetTableNodes(contentXml,

nmsManager); XmlNode sheetsRootNode = tableNodes.Item(0).ParentNode; // remove sheets from template file foreach (XmlNode tableNode in tableNodes) sheetsRootNode.RemoveChild(tableNode); return sheetsRootNode; } private void SaveSheet(DataTable sheet, XmlNode sheetsRootNode) { XmlDocument ownerDocument = sheetsRootNode.OwnerDocument; XmlNode sheetNode =

ownerDocument.CreateElement("table:table", this.GetNamespaceUri("table"));

XmlAttribute sheetName =

ownerDocument.CreateAttribute("table:name", this.GetNamespaceUri("table"));

sheetName.Value = sheet.TableName; sheetNode.Attributes.Append(sheetName);

this.SaveColumnDefinition(sheet, sheetNode, ownerDocument); this.SaveRows(sheet, sheetNode, ownerDocument); sheetsRootNode.AppendChild(sheetNode); } private void SaveColumnDefinition(DataTable sheet, XmlNode

sheetNode, XmlDocument ownerDocument) { XmlNode columnDefinition =

ownerDocument.CreateElement("table:table-column", this.GetNamespaceUri("table"));

XmlAttribute columnsCount =

ownerDocument.CreateAttribute("table:number-columns-repeated", this.GetNamespaceUri("table"));

columnsCount.Value = sheet.Columns.Count.ToString(CultureInfo.InvariantCulture);

columnDefinition.Attributes.Append(columnsCount); sheetNode.AppendChild(columnDefinition); } private void SaveRows(DataTable sheet, XmlNode sheetNode,

XmlDocument ownerDocument) { DataRowCollection rows = sheet.Rows; for (int i = 0; i < rows.Count; i++) { XmlNode rowNode = ownerDocument.CreateElement("table:table-

row", this.GetNamespaceUri("table")); this.SaveCell(rows[i], rowNode, ownerDocument); sheetNode.AppendChild(rowNode); } } private void SaveCell(DataRow row, XmlNode rowNode, XmlDocument

ownerDocument) { object[] cells = row.ItemArray; for (int i = 0; i < cells.Length; i++) { XmlElement cellNode =

ownerDocument.CreateElement("table:table-cell", this.GetNamespaceUri("table"));

if (row[i] != DBNull.Value) { // We save values as text (string)

XmlAttribute valueType = ownerDocument.CreateAttribute("office:value-type", this.GetNamespaceUri("office"));

valueType.Value = "string"; cellNode.Attributes.Append(valueType); XmlElement cellValue =

ownerDocument.CreateElement("text:p", this.GetNamespaceUri("text")); cellValue.InnerText = row[i].ToString(); cellNode.AppendChild(cellValue); } rowNode.AppendChild(cellNode); } } private void SaveContentXml(ZipFile templateFile, XmlDocument

contentXml) { templateFile.RemoveEntry("content.xml"); MemoryStream memStream = new MemoryStream(); contentXml.Save(memStream); memStream.Seek(0, SeekOrigin.Begin); templateFile.AddEntry("content.xml", memStream); } private string GetNamespaceUri(string prefix) { for (int i = 0; i < namespaces.GetLength(0); i++) { if (namespaces[i, 0] == prefix) return namespaces[i, 1]; } throw new InvalidOperationException("Can't find that namespace

URI"); } } }

power builder

Documents

private zipfile getzipfile

spreadsheet document

odf specification

return zipfile

odf files

zip component

zip file

ods files