Skip navigation

Monthly Archives: January 2009

While having to work on binary PowerPoint import/export recently, I found the support for “debugging” the actual file format a bit lacking (to say the least). Of course, for those with MSDN access there’s the magic FileViewer.exe, but that’s of limited use on those other platforms I fancy working on, plus one cannot easily extend it.

I was therefore enviously looking at Daniel’s biffdumper and even more at Kohei’s xls-dump.py, and thusly set off ripping the guts out of the latter & hacking up a ppt-dump.py – which was a fun project actually!

Basically, what this gives you is a human-readable (and diffable!) dump of binary ppt files, like this:

====================================================================
[DFF_msofbtClientTextbox]
(type: F00Dh inst: 0000h, vers: 000Fh, start: 2760, size: 127)
====================================================================

 ====================================================================
 [DFF_PST_TextHeaderAtom]
 (type: 0F9Fh inst: 0000h, vers: 0000h, start: 0, size: 4)
 ====================================================================

 0F9Fh: -------------------------------------------------------------
 0F9Fh: 01 00 00 00 
 0F9Fh: -------------------------------------------------------------

 ====================================================================
 [DFF_PST_TextBytesAtom]
 (type: 0FA8h inst: 0000h, vers: 0000h, start: 12, size: 45)
 ====================================================================

 0FA8h: -------------------------------------------------------------
 0FA8h: text: 'Text^MText^MText^MText^MText^MText^MText^MText^MText'
 0FA8h: -------------------------------------------------------------
 0FA8h: 54 65 78 74 0D 54 65 78 74 0D 54 65 78 74 0D 54 
 0FA8h: 65 78 74 0D 54 65 78 74 0D 54 65 78 74 0D 54 65 
 0FA8h: 78 74 0D 54 65 78 74 0D 54 65 78 74 0D 
 0FA8h: -------------------------------------------------------------

 ====================================================================
 [DFF_PST_StyleTextPropAtom]
 (type: 0FA1h inst: 0000h, vers: 0000h, start: 65, size: 22)
 ====================================================================

 0FA1h: -------------------------------------------------------------
 0FA1h: para props for 46 chars, indent: 0
 0FA1h: para prop given: para linespacing 80
 0FA1h: -------------------------------------------------------------
 0FA1h: char props for 46 chars
 0FA1h: char prop given: char font size 30
 0FA1h: -------------------------------------------------------------
 0FA1h: -------------------------------------------------------------
 0FA1h: 2E 00 00 00 00 00 00 10 00 00 50 00 2E 00 00 00 
 0FA1h: 00 00 02 00 1E 00 
 0FA1h: -------------------------------------------------------------

 ====================================================================
 [DFF_PST_TextSpecInfoAtom]
 (type: 0FAAh inst: 0000h, vers: 0000h, start: 95, size: 24)
 ====================================================================

 0FAAh: -------------------------------------------------------------
 0FAAh: 2D 00 00 00 01 00 00 00 00 00 01 00 00 00 07 00 
 0FAAh: 00 00 00 00 09 08 00 00 
 0FAAh: -------------------------------------------------------------

[...]

Kudos to Kohei for his great work on xls_dump, of which I reused the structure and most importantly the biff record parsing. I’m not aware of anything like ppt-dump.py, but would of course be interested if there is.

Other than that, here’s a brief list of other FLOSS tools for MSO binary document handling I’m aware of (besides OOo, of course):

Update: merge with xls_dump done, adapted viewvc links

Advertisement