While having to work on binary PowerPoint import/export recently, I found the support for “debugging” the actual file format a bit lacking (to say the least). Of course, for those with MSDN access there’s the magic FileViewer.exe, but that’s of limited use on those other platforms I fancy working on, plus one cannot easily extend it.
I was therefore enviously looking at Daniel’s biffdumper and even more at Kohei’s xls-dump.py, and thusly set off ripping the guts out of the latter & hacking up a ppt-dump.py – which was a fun project actually!
Basically, what this gives you is a human-readable (and diffable!) dump of binary ppt files, like this:
====================================================================
[DFF_msofbtClientTextbox]
(type: F00Dh inst: 0000h, vers: 000Fh, start: 2760, size: 127)
====================================================================
====================================================================
[DFF_PST_TextHeaderAtom]
(type: 0F9Fh inst: 0000h, vers: 0000h, start: 0, size: 4)
====================================================================
0F9Fh: -------------------------------------------------------------
0F9Fh: 01 00 00 00
0F9Fh: -------------------------------------------------------------
====================================================================
[DFF_PST_TextBytesAtom]
(type: 0FA8h inst: 0000h, vers: 0000h, start: 12, size: 45)
====================================================================
0FA8h: -------------------------------------------------------------
0FA8h: text: 'Text^MText^MText^MText^MText^MText^MText^MText^MText'
0FA8h: -------------------------------------------------------------
0FA8h: 54 65 78 74 0D 54 65 78 74 0D 54 65 78 74 0D 54
0FA8h: 65 78 74 0D 54 65 78 74 0D 54 65 78 74 0D 54 65
0FA8h: 78 74 0D 54 65 78 74 0D 54 65 78 74 0D
0FA8h: -------------------------------------------------------------
====================================================================
[DFF_PST_StyleTextPropAtom]
(type: 0FA1h inst: 0000h, vers: 0000h, start: 65, size: 22)
====================================================================
0FA1h: -------------------------------------------------------------
0FA1h: para props for 46 chars, indent: 0
0FA1h: para prop given: para linespacing 80
0FA1h: -------------------------------------------------------------
0FA1h: char props for 46 chars
0FA1h: char prop given: char font size 30
0FA1h: -------------------------------------------------------------
0FA1h: -------------------------------------------------------------
0FA1h: 2E 00 00 00 00 00 00 10 00 00 50 00 2E 00 00 00
0FA1h: 00 00 02 00 1E 00
0FA1h: -------------------------------------------------------------
====================================================================
[DFF_PST_TextSpecInfoAtom]
(type: 0FAAh inst: 0000h, vers: 0000h, start: 95, size: 24)
====================================================================
0FAAh: -------------------------------------------------------------
0FAAh: 2D 00 00 00 01 00 00 00 00 00 01 00 00 00 07 00
0FAAh: 00 00 00 00 09 08 00 00
0FAAh: -------------------------------------------------------------
[...]
Kudos to Kohei for his great work on xls_dump, of which I reused the structure and most importantly the biff record parsing. I’m not aware of anything like ppt-dump.py, but would of course be interested if there is.
Other than that, here’s a brief list of other FLOSS tools for MSO binary document handling I’m aware of (besides OOo, of course):
Update: merge with xls_dump done, adapted viewvc links