If you remember one of my first posts, I mentioned the need to do pre and post-processings to be able to convert certain features. One of the first need was indeed to build a ZIP file after the conversion. For this purpose, we created a processor called "ZipArchiveWriter" that takes the XML flow produced by the XSL transformation and creates the desired ZIP entries. We are now facing other situations where pre or post-processings are needed.

Special characters treatment

In OpenDocument, special unicode characters are used for unbreakable spaces, soft hyphens... whereas OpenXml uses XML tags. We could handle those conversions within our XSL files, but that would be very time-consuming (each single character must be checked). To avoid that, we implemented a filter that just takes the output of the transformation and replaces the special characters with the appropriate XML elements.

Technically, each post-processor is a XmlWriter, and we just chain those XmlWriters one after the other (the output of the first one being the input of the second one and so on, the last one being the ZipArchiveWriter). That allows us to dedicate each post-processor to a specific task, while remaining very low impacting on the global performance of the converter. The only drawback of such a method is that we can only work on the fly on the XML flow. But until now, it was not a problem. And it shouldn't be for the next post-processor we'll have to implement: the one that will convert to the automatic styles that we find in OpenDocument into run properties.

Password protected documents

Another situation we are facing is password-protected documents. As you know, OpenDocument files are in fact ZIP files containing XML data. When you choose to protect an ODF document with a password, the XML files embedded in the ZIP archive are simply encrypted with a dedicated algorithm (Blowfish) before they are stored into the archive. So for our XSLT engine to be able to process those XML files, we need to decrypt them first. That will be done through a pre-processor.

Actually we are already using a pre-processor to extract the files from the archive - but it is somehow hidden by the "resolver" mechanism from the .NET framework: when instanciating the XSLT processor, we specify a custom resolver (called "ZipResolver") that must be used to find the needed resources. This custom resolver simply retrieves the streams from the ZIP archive. To handle password-protected files, we will insert a decoding mechanism inside the ZipResolver.

You certainly noticed that I'm speaking in the future. That's because we don't have any implementation of the Blowfish algorithm in C# yet... We need an Open Source implementation compatible with the BSD license, and there does not seem to exist any. This is exactly the same issue we had for ZIP compression / decompression. For obvious reasons, we don't want to code a new implementation from scratch (it would be very time-consuming, with the risk of adding bugs or even security holes to our code). So the most reasonable solution may be to find a C-library and to add a C# wrapper around it - exactly the same way that we did with zlib. If anybody has such an experience and wants to help us to do this job, feel free to contact us!