Forensic Blogs

An aggregator for digital forensics blogs

January 25, 2022 by Rolf Rolles

An Exhaustively Analyzed IDB for ComLook

This blog entry announces the release of an exhaustive analysis of ComLook, a newly-discovered malware family about which little information has been published. It was recently discovered by ClearSky Cyber Security, and announced in a thread on Twitter. You can find the IDB for the DLL here, in which every function has been analyzed, and every data structure has been recovered.

Like the previous two entries in this series on ComRAT v4 and FlawedGrace, I did this analysis as part of my preparation for an upcoming class on C++ reverse engineering. The analysis took about a one and a half days (done on Friday and Saturday). ComLook is an Outlook plugin that masquerades as Antispam Marisuite v1.7.4 for The Bat!. It is fairly standard as far as remote-access trojans go; it spawns a thread to retrieve messages from a C&C server over IMAP, and processes incoming messages in a loop. Its command vocabulary is limited; it can only read and write files to the victim server, run commands and retrieve the output, and update/retrieve the current configuration (which is saved persistently in the registry). See the IDB for complete details.

(Note that if you are interested in the forthcoming C++ training class, it is nearing completion, and should be available in Q2 2022. More generally, remote public classes (where individual students can sign up) are temporarily suspended; remote private classes (multiple students on behalf of the same organization) are currently available. If you would like to be notified when public classes become available, or when the C++ course is ready, please sign up on our no-spam, very low-volume, course notification mailing list. (Click the button that says "Provide your email to be notified of public course availability".) )

This analysis was performed with IDA Pro 7.7 and Hex-Rays 32-bit. All analysis has been done in Hex-Rays; go there for all the gory details, and don't expect much from the disassembly listing. All of the programmer-created data structures have been recovered and applied to the proper Hex-Rays variables. The functionality has been organized into folders, as in the following screenshot:

The binary was compiled with MSVC 10.0 with RTTI, and uses standard C++ template containers:

string/wstring

shared_ptr

vector

list

map

The primary difficulty in analyzing this sample was that it was compiled in debug mode. Although this does simplify some parts of the analysis (e.g., error message contain the raw STL typenames), it also slows the speed of comprehension due to a lack of inlining, and includes a huge amount of code to validate so-called C++ debug iterators. This makes locating the real programmer-written functionality more time-consuming. STL functions involving iterators that, in a release build, would have consumed less than a page of decompilation, often consumed five or more pages due to debug iterator validation.

Read the original at: Blog - Möbius Strip Reverse EngineeringFiled Under: Malware Analysis

September 21, 2021 by Rolf Rolles

Automation in Reverse Engineering C++ STL/Template Code

There are three major elements to reverse engineering C++ code that uses STL container classes:

Determining in the first place that an STL container is being used, and which category, i.e., std::list vs. std::vector vs. std::set Determining the element type, i.e., T in the categories above Creating data types in your reverse engineering tool of choice, and applying those types to the decompilation or disassembly listing.

Though all of those elements are important, this entry focuses on the last one: creating instantiated STL data types, and more specifically, types that can be used in Hex-Rays. The main contribution of this entry is simply its underlying idea, as I have never seen it published anywhere else; the code itself is simple enough, and can be adapted to any reverse engineering framework with a type system that supports user-defined structures.

I have spent the pandemic working on a new training class on C++ reverse engineering; the images and concepts in this blog entry are taken from the class material. The class goes into much more depth than this entry, such as by material on structure and type reconstruction, and having individual sections on each of the common STL containers.

(If you are interested in the forthcoming C++ training class, it will be completed early next year, and available for in-person delivery when the world is more hospitable. If you would like to be notified when public in-person classes for the C++ course is ready, please sign up on our no-spam, very low-volume, course notification mailing list. (Click the button that says "Provide your email to be notified of public course availability".) )

Overview and Motivation

At a language level, C++ templates are one of the most complex features of any mainstream programming language. Their introduction in the first place -- as opposed to a restricted, less-powerful version -- was arguably a bad mistake. They are vastly overcomplicated, and in earlier revisions, advanced usage was relegated to true C++ experts. Over time, their complexity has infested other elements of the language, such as forming the basis for the C++11 auto keyword. However, the basic, original ideas behind C++ templates were inconspicuous enough, and are easy to explain to neophytes. Moreover, reverse engineers do not need to understand the full complexity of C++ templates for day-to-day work.

Let's begin with a high-level overview of which problems in C software development that C++ templates endeavored to solve, and roughly how they solved them. Put simply, many features of C++ were designed to alleviate situations where common practice in C was to copy and paste existing code and tweak it slightly. In particular, templates alleviate issues with re-using code for different underlying data types.

C does offer one alternative to copy-and-paste in this regard -- the macro preprocessor -- though it is a poor, cumbersome, and limited solution. Let's walk through a small real-world example. Suppose we had code to shuffle the contents of a char array, and we wanted to re-use it to shuffle int arrays.

Example-Shuffle-C.png

We could simply search and replace char for int this code, and also rename both functions (to, e.g., swap_int and shuffle_int). However, this is a poor solution; it would be nice if there was a language mechanism to do this automatically. Over time, C programmers came to use the macro preprocessor for problems like this. The following two-phase animated GIF shows the first step: wrapping the existing code in a macro declaration.

AnimMacros1.gif

Next, we want to replace the elements that we wish to alter with macro parameters. In the following two-phase animation, we begin by replacing char with the macro parameter TYPE:

AnimMacros2.gif

Finally, we use the token-pasting operator ## to rename the functions:

AnimMacros3.gif

Now, the process of converting the code to a macro is complete. We can do the same for the function declaration:

ShuffleHead.png

And then put both the source and header macros into a single header file:

ShuffleHeadBoth.png

Now, we can use this header to create shuffle instantiations for any type we choose. For each desired type, simply include the header file from above and instatiate the macros:

ShuffleCharInt.png

Then, the following three-phase animated GIF shows how the C preprocessor expands the macros for any given instantiation:

AnimMacroExpand.gif

A real-world example can be seen in the uthash library.

The C++ template solution looks very similar to the C macro solution:

ShuffleTemplate.png

The differences are that:

TYPE becomes a template typename parameter. We don't need the line continuation characters (i.e., \). We don't need to rename the function.

The same concept really shines in the area of generic data structures. For example, here is the difference between a hypothetical List data structure as a C macro and as a C++ template. (Note that this definition of List is unrelated to the official C++ std::list, mentioned below.)

DataStructMacroTemplate.png C++ Template Containers

The C++ standard template library (STL) contains high-quality, generic implementations of common data structures. They are one of the best features of C++, and are used extensively by C++ programmers in real code. As hinted in the example above, they allow C++ programmers to immediately create instances with any types they choose.

One of the most popular STL data structures is known as std::vector. Vectors are like a better version of C arrays:

Like arrays, std::vector offers fast lookups using the same syntax (i.e. arr[idx] vs. vec[idx]). Unlike arrays, std::vector knows the size of the underlying memory (i.e. vec.size()). Unlike arrays, std::vector can grow and shrink dynamically, without requiring the programmer to manually manage the memory (i.e. vec.push_back(1) and vec.pop_back()). Unlike arrays, std::vector offer bounds-checked indexing -- an imporant step for mitigating C's memory safety issues. I.e., vec.at(500) will throw an exception if vec.size() < 500. Programmers can obtain a pointer to the underlying array for compatibility with existing code that requires such pointers. Starting from C++11, this can be accomplished via vec.data(). Unlike arrays, std::vector can own the elements it contains, such that deleting the vector will delete those elements. This simplifies resource management.

MSVC's implementation of std::vector is rather simple; when stripped to its core, it simply contains three pointers:

VectorDecl.png

(Before proceeding, it is important to note that the C++ standard specifies the class interfaces for these data structures, but not their internals. Different implementations of the STL -- such as the ones used by MSVC, and by GCC -- implement these container classes differently. Any concrete examples below come from MSVC; the data structures can and will differ for other STL implementations, but the idea in this blog entry can be adapted to any of them.)

These pointers locate the base of the allocation, the last used element, and the last allocated element, as in the following image:

VectorPointers.png

Upon instantiating std::vector with a concrete type for T, the C++ compiler will create a new class layout using the proper T pointers. For example, here we can see two different instantiations for int and string:

VectorIntString.png Reverse Engineering STL Template Containers

When reverse engineering code that makes use of std::vector, after discovering that a std::vector is indeed in use, the next step is to discover which type was used for T. This comes down to standard structure recovery, and will not be covered in this blog entry. However, my upcoming C++ reverse engineering class will cover these steps in depth. Let's imagine that the reverse engineer discovers they are dealing with a std::vector.

The next step is for the reverse engineer to recreate a data structure compatible with std::vector, and to apply those types to the decompilation or disassembly listing. The reverse engineer could simply create a type like this:

ReVectorInt.png

And then apply it to the decompilation. The code might look like this, before (left) and after (right):

HRVectorBeforeAfter.png

Although creating a compatible std::vector facsimilie is not very difficult, creating these structures can be tedious work. For example, my ComRAT IDB contains std::vector instantiations for 16 different types T. Moreover, other STL containers are more complex; for example, std::map actually consists of 5 different, related structures.

The primary contribution of this blog entry is to simply inform the reader that they can create these structures programmatically. In IDA and Hex-Rays, this is as simple as using format strings and an API named parse_decls.

ScriptVectorExample.png

One simply calls this function like MakeVectorType("int"), and one is greeted with a new type named vector_int in the local types window. For more complex scenarios where the type cannot be used in the struct name (for example, int* would lead to a struct with the invalid name vector_int*), the second, optional parameter can be used to create the name of the struct. I.e., MakeVectorType("int*","pInt") would create a struct named vector_pInt whose pointer types were int *.

And... that's more or less it! I have collected the common STL data types into the linked script, such that once the reverse engineer has figured out which types are being used, they can immediately create the structures with no effort on their part. Those types are:

std::vector std::list std::map and std::set std::deque std::shared_ptr

For one small added benefit, my script also knows the function prototypes for some common STL container class member functions. The reverse engineer can simply copy and paste these into the function prototype type declaration to immediately set a valid type for these member functions.

These same ideas can be adapted to other STL implementations, as well as custom template libraries. For example, Valve uses their own STL-esque template library; a few minutes worth of work can be used to produce a script comparable to the above.

Not so bad, is it? Reverse engineering C++ template code does not have to be difficult.

Read the original at: Blog - Möbius Strip Reverse EngineeringFiled Under: Malware Analysis

June 1, 2021 by Rolf Rolles

Hex-Rays, GetProcAddress, and Malware Analysis

This entry is about how to make the best use of IDA and Hex-Rays with regards to a common scenario in malware analysis, namely, dynamic lookup of APIs via GetProcAddress (and/or import resolution via hash). I have been tempted to write this blog entry several times; in fact, I uploaded the original code for this entry exactly one year ago today. The problem that the script solves is simple: given the name of an API function, retrieve the proper type signature from IDA's type libraries. This makes it easier for the analyst to apply the proper types to the decompilation, which massively aid in readability and presentability. No more manually looking up and copying/pasting API type definitions, or ignoring the problem due to its tedious solution; just get the information directly from the IDA SDK. Here is a link to the script.

Background

Hex-Rays v7.4 introduced special handling for GetProcAddress. We can see the difference -- several of them, actually -- in the following two screenshots. The first comes from Hex-Rays 7.1:

HR71.png

The second comes from Hex-Rays 7.6:

HR76.png

Several new features are evident in the screenshots -- more aggressive variable mapping eliminating the first two lines, and automatic variable renaming changing the names of variables -- but the one this entry focuses on has to do with the type assigned to the return value of GetProcAddress. Hex-Rays v7.4+ draw upon IDA's type libraries to automatically resolve the name of the procedure to its proper function pointer type signature, and set the return type of GetProcAddress to that type.

This change is evident in the screenshots above: for 7.1, the variable is named v7, its type is the generic FARPROC, and the final line shows a nasty cast on the function call. For 7.6, the variable is named IsWow64Process, its type is BOOL (__stdcall *)(HANDLE, PBOOL) (the proper type signature for the IsWow64Process API), and the final line shows no cast. Beyond the casts, we can also see that applying the type signature also changes the types of other local variables: v5 in the first has the generic type int, whereas v5 has the proper type BOOL in the second.

These screenshots clearly demonstrate that IDA is capable of resolving an API name to its proper type signature, the desirable effects of applying the proper type signature on readability, and the secondary effects of setting the types of other variables involved in calling those APIs.

Relevance to Malware Analysis

Hex-Rays' built-in functionality won't work directly when malware looks up API names by hash, or uses encrypted strings for the API names: the decompiler must see a fixed string being passed to GetProcAddress to do its magic. Although the malware analysis community seems very comfortable in dealing with imports via hash and encrypted strings, they seem less comfortable with applying proper type signatures to the resultant variables and structure members. Only one publication I'm aware of bothers to tackle this, and it relies upon manual effort to retrieve the type definitions and create typedefs for them. This is unfortunate, as applying said types dramatically cleans up the decompilation output, but this is understandable, as the manual effort involved is rather cumbersome.

As a result, most publications that encounter this problem feature screenshots like this one (note all of the casts on the function pointer invocations, and the so-called "partial types" _QWORD etc):

AnalysisWithCasts.png

(I chose not to link the analysis from which the above screenshot was lifted, because my goal here is positive assistance to the malware analysis community, and not to draw negative attention to anyone's work in particular. This pattern is extremely frequent throughout presentations of malware analysis; it is immaterial who authored the screenshot above, and I had other examples to choose from.)

The Solution

I did not know how to resolve an API name to its type signature, so I simply reverse engineered how Hex-Rays implements the functionality mentioned at the top of this entry. The result is a function PrintTypeSignature(apiName) you can use in your scripts and day-to-day work that does what its name implies: retrieves and prints the type signature for an API specified by name.

The script includes a demo function Demo() that resolves a number of API type signatures and prints them to the console. It begins by declaring a list of strings:

PyDemo.png

The output of the script is the type signatures, ready to be copied and pasted into the variable type window and/or a structure declaration.

ScriptOutput.png A Final Note

Architecturally, there is a discrepancy between how the Hex-Rays microcode machinery handles type information for direct calls versus indirect ones. To summarize, you may still see casts in the output after applying the proper type signature to a variable or structure member. If this happens, right-click on the indirect call and press Force call type to force the proper type information to be applied at the call site. However, only do this once you have set the proper type information for the function pointer variable or structure member.

ForceCallType.png

Mostly I published this because I want to see more types applied, and fewer casts, in the malware analysis publications that I read. Please, use my script!

Read the original at: Blog - Möbius Strip Reverse EngineeringFiled Under: Malware Analysis

  • 1
  • 2
  • 3
  • …
  • 35
  • Next Page »

About

This site aggregates posts from various digital forensics blogs. Feel free to take a look around, and make sure to visit the original sites.

  • Contact
  • Aggregated Sites

Suggest a Site

Know of a site we should add? Enter it below

Sending

Jump to Category

All content is copyright the respective author(s)