Let's talk a little about the technical choices we made for this project. There were several aspects in this project that could lead to several technologies. The first need was to write a plug-in for Word 2007 that could open and save ODF text documents. Beside that, we wanted to provide some command line tools too, very useful for development and testing. We also kept in mind that to build an Open Source community, we had to use Open technologies as often as possible. Finally, we also had to take into account our own interests (internal competences, cost vs. performance, development speed, etc.).

We did not have many choices for the Word integration part: we could either have used an (old) C API provided by Microsoft (the one used to write filters) or write COM or .NET shared add-ins. The C API would have allowed us to do a closer integration with Word: it gives access to the file formats used in the "Open" and "Save As" menus. But we did not feel like writing an entire converter in C - it would have be really complicated, and in Clever Age we have more competences in the new technologies, such as Java, .NET or PHP. Moreover, for obvious political reasons, we wanted to base our converter on Microsoft's new Office Open XML format - it would be a example of the new possibilities offered by the XML technology. The C API was based on the good old RTF format, so definitively we threw it away. Having some C#.NET competences internally, we decided to build our plug-in on this technology.

But that was for the integration part only - that is, add new entry menus and launch the conversions. For the conversion itself, we still had several choices available: we could have writen the whole converter in C# (using a SAX-like approach, based on events handling), or benefit from the XML technology and use either XQuery or XSLT. The first approach would have certainly offered the most flexibility and performance. But it would have required a lot more development effort (it is less structured than the two others, and we would have had to code a lot of things that are automatically done in XQuery or XSLT). From Microsoft's point of view, XQuery may have had the preference, as they announced that they would progressively give up XSLT support in the future (they don't plan to release an XSLT 2.0 engine). But the main problem with XQuery (appart from the fact that it is less dedicated to transform documents than XSLT is) is that there is still no parser available in the .NET framework... Once again, we had XSLT skills (we wrote a converter for OpenOffice.org 1.0 in the past) and we thought it was the best compromise between performance (Microsoft .NET 2.0 XSLT engine is very performant) and development speed. Moreover (and not the least argument from our point of view), it could allow other applications to reuse the converter in other contexts - we thought of OpenOffice.org for instance, that already had converters based on XSLT - and we believed therefore that we would have more chances to build a community.

Before starting the project, we discussed all those possibilities with Microsoft architects working on Word and interoperability, and they approved our recommandations. There were still some technical points to decide, among them the library to use for ZIP compression/decompression (surprisingly Microsoft doesn't provide any, and the licence we chose - BSD - prevented us from using SharpZipLib, which is released under GPL) or the way to handle multi-files generation in XSLT (should we run several XSLT engines or a unique engine and split the single XML output into several files during post-processing?). For the first issue, we finally developed a wraping around the unmanaged Zlib library; for the second one, we chose to produce a single XML flow that is automatically split into different files - you can find more information in the technical documentation available for download on SourceForge.

As both ODF and OOX formats are ZIP-based, we had no possibility but writing some pre- and post-processings (it is technically impossible to read or generate ZIP files using only XSLT), the question being: what to do in XSLT and what to delegate to the post-processor? Still today we happen to have to make such decisions, mainly to avoid performance issues and sometimes because of XSLT's technical restrictions (but that will be the subject of another discussion - stay tuned! ;-).