One thing I teach my students in my static reverse engineering training classes is to exploit information that programmers have left in the binaries for debugging purposes. The most obvious example of this is when the program contains a debug logging function, where one of the parameters is the function's name. In this case, the reverse engineer can write a script to extract the function names and automatically rename the calling function, thus saving valuable time and making their lives easier. (I teach several other example scenarios.)
Inspired by my recent forays into the Hex-Rays API, today I found a new method for exploiting such ad hoc debug-type information. This entry will demonstrate how to use the Hex-Rays CTREE API to extract information and automatically rename functions as a result, which would otherwise be more cumbersome and error-prone to do with the ordinary IDA SDK. The code is available here.
In particular, today I was reverse engineering the Hex-Rays decompiler itself. The Hex-Rays API is unusual in that the user-accessible API functions are not defined as exports from hexrays.dll. (In fact, hexrays.dll exports only the standard DLL entrypoint, as well as the standard plugin_t structure required of all IDA plugins.) Instead, the "exported" API methods are all declared as small inline functions that each invoke the same function, namely "hexdsp". The first parameter to that function is an enumeration element of type "hexcall_t", which tells the Hex-Rays kernel which API is being called, and the subsequent parameters are then specific to the called API. Inside of hexrays.dll, the hexdsp function performs a switch over the hexcall_t parameter, and then invokes the specified function.
To be concrete about it, the the last 3500 lines of hexrays.hpp all look roughly like this:
If the API functions had been exported from the DLL, IDA would have automatically renamed them for us, which would have made trivial the task of finding the APIs that we wanted to reverse engineer. Instead, nearly none of the functions in the the database have meaningful names.
The lack of meaningful function names does not prove to be a huge impediment to reverse engineering Hex-Rays API functions. We simply need to locate the hexdsp function in hexrays.dll, and locate the switch case that corresponds to the hexcall_t enumeration element that interests us. Most of these switch cases are one-liners that immediately invoke the relevant API function.
To find hexdsp, we can textually search the disassembly listing for the word "switch", which IDA inserts automatically as a comment upon encountering a compiled switch construct.
Next, we can filter the search by the word "cases" to obtain the beginning of each switch construct, as well as the number of cases that the switch contains.
Finally, we can count the number of enumeration elements in hexcall_t, and look for a switch with roughly that many cases. (For this, I simply selected the contents of the hexcall_t enumeration in my text editor, and looked at the "lines selected" display at the bottom.) hexcall_t contains 532 enumeration elements, and one of the switches has exactly 532 cases. That must be hexdsp.
Now, to reverse engineer any particular API function, we just need to find the value of the associated constant in the hexcall_t enumeration, and examine that particular switch case. IDA can offer us some more assistance here, also. I begin by copying and pasting the hexcall_t enumeration into its own text file and saving it:
Next, I use IDA's C header parsing functionality to automatically create an enumeration that I can use inside of IDA. If successful, we get a message box telling us so.
In IDA's Enumerations window, now we need to add the enumeration we just imported. Press insert to bring up the "Add enum type" window, and then use the "Add standard enum by enum name" option near the bottom. It will bring up a list of all currently-known enumerations; the one we just imported will be at the bottom.
We can use this enumeration to make the decompilation listing prettier. The decompiled switch statement currently shows the hexcall_t enumeration elements as integers. We can ask Hex-Rays to display them as enumeration element names instead, by placing our cursor on one of the integers and pressing 'M'.
Select the hexcall_t entry from the dialog that pops up and press enter. Voila, now we see the enumerated names in the decompilation listing:
So now, to reverse engineer one of the Hex-Rays API functions, our lives are even easier. The switch now has human-readable names for the enumeration elements; we just find the one we're interested in, and look at the function called in that switch case. To wit, here's gen_microcode():
Of course, finding the correct switch case is still tedious, given that there are 532 of them. I thought that the nicest solution would be to rename the functions after the name of the hexcall_t enumeration element in the corresponding switch case. Of course, doing this by hand for all 532 hexcall_t enumeration elements would be a waste of time, and we should automate it instead.
For this task, I decided to use the Hex-Rays CTREE API via IDAPython, which I discussed somewhat in my recent blog entry about the Hex-Rays Microcode API. My observation was that we could extract the case number from the case statements, get the name of the pertinent enumeration element, extract the address of the called function from the first line of code in the case body (if the first line does call a function), and then rename the called address after the enumeration element. Using the Hex-Rays API saves us from having to write nasty, brittle code to search the disassembly listing instruction-by-instruction, and also makes it convenient for us to access the switch case numbers and their associated case statement bodies.
Our script will make use of so-called "CTREE visitor" objects. Our CTREE visitor classes contain methods such as "visit_insn" and "visit_expr", which Hex-Rays will invoke for every element in the CTREE listing.
The first visitor, which I called SwitchExaminer, exists merely to locate the switch statement within hexdsp's decompilation listing. Once found, it first extracts the case number. Next, it then extracts the first line of code in the body of the case statement, and passes it to another visitor. The second visitor, called a CallExtractor, extracts the addresses of all called functions (if any).
Finally, some glue code at the bottom of the script applies the visitors to the decompilation of hexdsp, and then renames the called functions after the hexcall_t enumerated element numbers that were collected from the switch statement.
That's all; after about 100 lines of heavily-commented IDAPython code, the script renames 436 functions for us automatically, nearly 10% of the non-library, non-thunk functions in the binary.
You can find the code here. A small note is that the code takes advantage of an IDAPython wrapper function regarding switch case values, which exists in IDA 7.2 beta, but does not exist in IDA 7.1 or below. In particular, you will not be able to run the script as-is until IDA 7.2 is released. If you would like to do something similar with a different use case that does not require you to extract switch case label values, you might be able to adapt the existing code for that purpose in the meantime.