This is an update of my PDF tools.
There are a couple of bug fixes for pdf-parser and pdfid.
And 2 new features in pdf-parser, inspired by a private training on maldoc analysis I gave last week. I often get good ideas from my students, and sometimes, even I get a good idea in class .
Option -o can now be used to select multiple objects: separate the indices by a comma.
There’s a new environment variable, PDFPARSER_OPTIONS, that can be used to provide extra options you want to include with each execution of pdf-parser.py. This is useful for option -O, an option to parse stream objects.
It’s actually best to always parse stream objects, i.e. always use option -O. But I decided not to make this an option that is on by default, so that the behavior of pdf-parser would remain unchanged. I consider this important for the many people that rely on a predictable behavior of pdf-parser, like teachers and students of infosec trainings where my tools are used/mentioned.
However, always including option -O is tedious and error prone. So now you can have best of both worlds, by defining an environment variable with name PDFPARSER_OPTIONS and value -O.
And finally, I started to add a man page (option -m), like I do with many of my other tools. This is a work in progress: for the moment, it points to my free PDF analysis e-book that explains the use of pdfid and pdf-parser.