/ blog / java / please-document-emf-plus.html

Root Beer Logo Root Beer

Chris Nokleberg's Fizzy Weblog

October 2005
Su M Tu W Th F Sa
2 3 4 7 8
9 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31
Previous  |  Next  |  More...
#  Please document EMF+

As part of our Java PowerPoint renderer I've had to decode all of the image formats that can be embedded within PowerPoint, including the three metafile formats: WMF, EMF, and PICT. Now I could write a whole essay on the how ugly these formats are, but it wouldn't really be fair. All three are mostly just straight dumps of API method calls and in-memory records to disk. A fine choice at the time, I'm sure, but it means that to accurately render them you have to reimplement a big chunk of Windows GDI and Color QuickDraw.

On the Windows platform, GDI+ promised to make a lot of the headaches of GDI go away. First and foremost, GDI+ is much more device-independent than GDI. For example, GDI+ did away with raster operations, which are particularly difficult to deal with when converting to modern vector formats like PDF or SVG (although people have made valiant efforts). GDI+ also introduced a new metafile format, EMF+. Because EMF+ uses the new GDI+ records, it should be much nicer to work with. Some other advantages of EMF+ over EMF:

  • Gradient fills are represented mathematically, instead of by hundreds of tiny polygons (such "fake" gradients are very common in EMF files, and prevent them from being anti-aliased without serious artifacts)
  • Embedded images are stored as PNG or JPEG, instead of BMP
  • Support for full transparency, instead of faking binary transparency using bitmasks and raster operations

Unfortunately, EMF+ is almost completely undocumented. People ask about it on the newsgroups from time to time, but this is the closest thing I found to an answer from Microsoft, three years ago. Luckily, in my searches I did find Jeremy Todd, who as it turned out had reverse-engineered a good chunk of the EMF+ records himself. There are still a number of missing pieces, some of them critical, but it should be a very good base on which to build. One of the reasons for this blog entry is to get the search engines to index his page!

What does this have to do with PowerPoint? There is another variation of EMF+, called EMF+ Dual. Dual metafiles are EMF files with EMF+ records hidden inside EMF comments. This lets applications that understand EMF render them as normal. Applications which know to look for the special EMF+ comments can extract the EMF+ records and use them to render instead. In recent version of Microsoft Office, all generated EMFs are actually EMF+ Dual metafiles, including all the charts in Excel and PowerPoint.

The upshot is that currently only Microsoft can leverage their knowledge of the EMF+ records in order to produce better looking and more compact output when converting documents to PDF. Even after transitioning to their new XML formats, image objects, including EMF, will exist as binary files within the container ZIP. Hopefully Microsoft will see fit to finally document EMF+, because otherwise I don't see how the presence of secret binary data can be reconciled with the stated goal of an open format.

[Powered By FreeMarker]  [Valid Atom 1.0]  [Weblog Commenting by]