While having to work on binary PowerPoint import/export recently, I found the support for “debugging” the actual file format a bit lacking (to say the least). Of course, for those with MSDN access there’s the magic FileViewer.exe, but that’s of limited use on those other platforms I fancy working on, plus one cannot easily extend it.
I was therefore enviously looking at Daniel’s biffdumper and even more at Kohei’s xls-dump.py, and thusly set off ripping the guts out of the latter & hacking up a ppt-dump.py – which was a fun project actually!
Basically, what this gives you is a human-readable (and diffable!) dump of binary ppt files, like this:
==================================================================== [DFF_msofbtClientTextbox] (type: F00Dh inst: 0000h, vers: 000Fh, start: 2760, size: 127) ==================================================================== ==================================================================== [DFF_PST_TextHeaderAtom] (type: 0F9Fh inst: 0000h, vers: 0000h, start: 0, size: 4) ==================================================================== 0F9Fh: ------------------------------------------------------------- 0F9Fh: 01 00 00 00 0F9Fh: ------------------------------------------------------------- ==================================================================== [DFF_PST_TextBytesAtom] (type: 0FA8h inst: 0000h, vers: 0000h, start: 12, size: 45) ==================================================================== 0FA8h: ------------------------------------------------------------- 0FA8h: text: 'Text^MText^MText^MText^MText^MText^MText^MText^MText' 0FA8h: ------------------------------------------------------------- 0FA8h: 54 65 78 74 0D 54 65 78 74 0D 54 65 78 74 0D 54 0FA8h: 65 78 74 0D 54 65 78 74 0D 54 65 78 74 0D 54 65 0FA8h: 78 74 0D 54 65 78 74 0D 54 65 78 74 0D 0FA8h: ------------------------------------------------------------- ==================================================================== [DFF_PST_StyleTextPropAtom] (type: 0FA1h inst: 0000h, vers: 0000h, start: 65, size: 22) ==================================================================== 0FA1h: ------------------------------------------------------------- 0FA1h: para props for 46 chars, indent: 0 0FA1h: para prop given: para linespacing 80 0FA1h: ------------------------------------------------------------- 0FA1h: char props for 46 chars 0FA1h: char prop given: char font size 30 0FA1h: ------------------------------------------------------------- 0FA1h: ------------------------------------------------------------- 0FA1h: 2E 00 00 00 00 00 00 10 00 00 50 00 2E 00 00 00 0FA1h: 00 00 02 00 1E 00 0FA1h: ------------------------------------------------------------- ==================================================================== [DFF_PST_TextSpecInfoAtom] (type: 0FAAh inst: 0000h, vers: 0000h, start: 95, size: 24) ==================================================================== 0FAAh: ------------------------------------------------------------- 0FAAh: 2D 00 00 00 01 00 00 00 00 00 01 00 00 00 07 00 0FAAh: 00 00 00 00 09 08 00 00 0FAAh: ------------------------------------------------------------- [...]
Kudos to Kohei for his great work on xls_dump, of which I reused the structure and most importantly the biff record parsing. I’m not aware of anything like ppt-dump.py, but would of course be interested if there is.
Other than that, here’s a brief list of other FLOSS tools for MSO binary document handling I’m aware of (besides OOo, of course):
Update: merge with xls_dump done, adapted viewvc links