Free Programming E-Books
Free download ebooks on computer and programming

You should read this book if you want to extract data from an OpenOffice.org document, convert your data to an OpenOffice.org document, or simply find out how OpenOffice.org stores its data "under the hood."

Oreilly Free ebook : OpenOffice.org XML Essentials

The Proprietary World

Before we can talk about OpenOffice.org, we have to look at the current state of proprietary office suites and applications. In this world, all your documents are stored in a proprietary (often binary) format. As long as you stay within the office suite, this is not a problem. You can transfer data from one part of the suite to another; you can transfer text from the word processor to a presentation, or you can grab a set of numbers from the spreadsheet and convert it to a table in your word processing document.

The problems begin when you want to do a transfer that wasn't intended by the authors of the office suite. Because the internal structure of the data is unknown to you, you can't write a program that creates a new word processing document consisting of all the headings from a different document. If you need to do something that wasn't provided by the software vendor, or if you must process the data with an application external to the office suite, you will have to convert that data to some neutral or "universal" format such as Rich Text Format (RTF) or comma-separated values (CSV) for import into the other applications. You have to rely on the kindness of strangers to include these conversions in the first place. Furthermore, some conversions can result in loss of formatting information that was stored with your data.

Note also that your data can become inaccessible when the software vendor moves to a new internal format and stops supporting your current version. (Some people actually suggest that this is not cause for complaint since, by putting your data into the vendor's proprietary format, the vendor has now become a co-owner of your data. This is, and I mean this in the nicest possible way, a dangerously idiotic idea.)

The OpenOffice.org Approach

OpenOffice.org has as its mission "[t]o create, as a community, the leading international office suite that will run on all major platforms and provide access to all functionality and data through open-component based APIs and an XML-based file format."

Download free ebook : Oreilly--OpenOffice.org_XML_Essentials_(Unpublished).pdf
free oreilly ebook - OpenOffice.org XML Essential

The OpenOffice.org file format is not simply an XML wrapper for a binary format, nor a one-to-one correspondence between the XML tags and the internal data structures. Instead, it is an idealized representation of the structure. This allows future versions of OpenOffice.org to implement new features or completely alter internal data structures without requiring major changes to the file format. You can see the full details of this design decision at http://xml.openoffice.org/xml_advocacy.html