Something I've been interested in for quite some time is the metadata embedded in various file formats used on Windows systems, and this interest has cracked the shell a bit in more than a few cases, and given me just a peak inside of what may be going on when someone creates a file for malicious purposes. I've not only been able to pull apart OLE format MS documents (MS Word .doc files), but also the OLE objects embedded inside the newer .docx format files.
One tool I like to use is Didier Stevens' oledump.py, in part because it provides a great deal of functionality, particularly where it will decompress VBA macros. However, I will also use my own OLE parser, because it pulls some metadata that I tend to find very useful when developing a view into an adversary's activities. Also, there is the potential for additional data to be extracted from areas that other tools simply do not address.
An example of simply metadata extraction came from the Mia Ash persona that Allison Wickoff (an excellent intel analyst at SecureWorks) unraveled. I didn't have much at all to do with the amazing work that Allison did...all I did was look at the metadata associated with a document sent to the victims. However, from that small piece of information came some pretty amazing insights.
Another, much older example of how metadata can be extracted and used comes from the Blair case. That's all I'll say about that one.
Another file format that I've put some work into understanding and unraveling is the LNK file format, which MS has done a very good job of documenting.
There are a number of FOSS tools available for parsing the binary LNK file format that will display the various fields, but I noticed during some recent testing that there were files submitted to VirusTotal that had apparently "done their job" (i.e., been successfully used to infect systems), but the parsing tools failed at various points in dissecting the binary contents of the file. Some tools that did all of their parsing and then displayed the various fields failed completely, and those that displayed the fields as they were parsed appeared to fail only at a certain point. Understanding the file format allowed me to identify where the tools appeared to be failing, and in this way, I'm not simply going back to some who wrote a tool with, "..didn't work." I know exactly what happened and why, and more importantly, can address it. As a result of the work I did, I have also seen that some aspects of these file formats can be abused without degrading the base functionality offered via the format.
This is where coding in DFIR comes in...using a hex editor, I was able to see where and why the tools were having issues, and I could also see that there really wasn't anything that any of the tools could do about. What I mean by that is, when parsing Extra Data Blocks within an LNK file (in particular, the TrackerDataBlock) what would be the "machine ID" or NetBIOS name of the system on which the LNK file was created extends beyond the bounds of the size value for that section. Yes, the machine ID value is of variable length (per the specification) but it also represents the NetBIOS name of a Windows system, so there's an expectation (however inaccurate) that it won't extend beyond the maximum length of a NetBIOS name.
In some of the test cases I observed (files downloaded from VirusTotal using the "tag: lnk" search term), the machine ID field was several bytes longer than expected, pushing the total length of the TrackerDataBlock out beyond the identified length. This causes all tools to fail, and begin reading the droids from the wrong location. In other instances, what would be the machine ID is a string of hex characters that extends beyond the length of the TrackerDataBlock, with no apparent droid fields visible via a hex editor.
These are possibly modifications of some kind, or errors in the tool used to create the LNK files (for an example of such a tool, go here). Either way, could these be considered 'tool marks' that could be used to track LNK files being deployed across campaigns?
Am I suggesting that everyone, every examiner needs to be proficient at coding and knowledgeable of file formats? No, not at all. In this blog post, I'm simply elaborating on a previous response, and illustrating how having an understanding of coding can be helpful.
Other examples of how coding in DFIR is useful include not just being able to automate significant portions of your workflow, but also being able to understand what an attacker may have been trying to achieve. I've engaged in a number of IR engagements over the years where attackers have left batch files or Visual Basic scripts behind, and for the most part, they were easy to understand, and very useful. However, every now and then, there would be a relatively un- or under-used command that required some research, or something "new".
Another example is decoding weaponized MSWord documents. Not long ago, I unraveled a macro embedded in an MSWord document, and having some ability to code not only helped me do the unraveling, but also helped me see what was actually being done. The macro had multiple layers of obfuscation, including base64 encoding, character encoding (i.e., using the character number from an ASCII table instead of the character itself, and decoding via chr()...), as well as a few other tricks. At one point, there was even a base64-encoded string embedded within a script that was, itself, base64 encoded.
So, in summary, having some capability to code, at some level, is extremely valuable within the DFIR community...that, or knowing someone. In fact, having someone (or a few someones) you can go to for assistance is just helpful all around.