Malware Analysis Lesson 8; The final boss of the assembly.

Last lesson was pretty heavy, covering even more instruction but also seeing how they come together to create the standard instruction sets that form the structures we expect to see from “regular” programming language like if statements, for and while loops. This time we are going to go into arrays, switch statements, windows API calls and some other bits and pieces as we try to round off these notes.

Switch statements are similar to IF statements only when we assign the variable a value and that value is used to select from a large number of switch options. It sounds abit weird but generally any menu we see where we chose from a selection of options uses switches. There are two common ways to implement switches, “if style” and “jump tables”.

If Style

If style switch statements are the standard one we think about where a variable is received, usually from the user, and the program goes through each switch case until a match is found and the code in that case is executed. If no match is found generally there is a default case that is used.

The above image shows the standard syntax for a high level switch statement. In Assembly this looks like;

Here we can see the variable i which is moved to the register EAX. We then use CMP command to try and find a match. If a match is found we execute the code, if its not we JMP to the next bit of assembly. On a side note this is actually quite a cool moment for me, reading through the assembly for this switch statement is now super easy. After the first 3 RE blog posts I am now able to recognize and understand the assembly commands used and the logic behind the flow of the code. If your getting lost when I mention EAX, JMP and CMP read my previous blog posts as they go through it all in detail. 🙂 Back on track now though. The JE command is new and means jump if equal. There is a good thread on it here. What JE does is it jumps to a given point if the cmp values match/are true. If they are not true than the code moves to the next line, in this case our JMP. At the end of this code block is an unconditional JMP to break out of the switch statement. By using obfuscated variables to match and a large number of switch cases this could be another area to obfuscate malware.

Jump Table

If styles work well with a limited number of options but as the number of cases grow performance can degrade. To prevent this and optimize the code we create jump tables. The jump tables define the offsets to the memory location of each separate case statements, acting as an index table with the switch variable as the search term. This allows for less comparisons needed by the code.

Switch jump table
High level vies of jump tables

In this example we can see a few things happening. Firstly the variable is subtracted by 1. This is because jump table index starts at 0. So in this case we see jump table cases 0-3 and not 1-4. Next we can see the code cmp each case label to see if it is out of range using JA, which is Jump if variable is above/greater than (description of this is here.).

Arrays

Arrays are made up similarly to code we have seen before. The main thing about arrays are how the values are assigned to the indices. In assembly Arrays are accessed using a base address as a starting point. Since these are both arrays of integers, each element is of size 4 and so each subsequent element can be accessed by a multiple of 4. This is because each entry is 4 bytes in size.

Arrays are simple data structures used to store similar data. In the example above we can see the square brackets used in the assembly to reference an array value. They are important for us to know though as malware is sometimes written to use and array of pointers to strings that contain multiple host names that are used as options for connections.

Calling conventions

Calling conventions are how we call functions. These function calls can appear differently in assembly code and calling conventions govern the way the function call occurs. Conventions include the order parameters are placed on the stacks and in the registers, as well as who is responsible for cleaning up the stack when the function is complete. There are different conventions and the convention used is specific to the compiler. Despite these differences we need to use certain universal conventions when using the Windows API.

CDECL

CDECL is one of the most common function call conventions and pushes parameters onto the stack from right to left. It is the responsibility of the caller to clean the stack after the function completes and the return value is stored in EAX. This is the convention thats been used in this blog. The way it cleans the stack is by using an ADD instruction to dereference the parameters pushed into stack by adding 8 bytes to the stack pointer.

STDCALL

This convention is similar to CDECL except it requires the callee to clean the stack after the function complete. The way this works is that the ADD instruction is not needed to clean the stack. STDCALL is used for Windows API calls. Any code calling these API functions will not need to clean up the stack after as the DLLs do this instead.

Windows

With that last bit done we are mostly finished with Assembly but there are a few tidbits on malware analysis I am going to include here. These bits are important as to understand malware functionality we also need to know the key components of the target OS outside of the assembly code that’s run. Lets focus on Windows APIs, The Registry and Networking API’s. The windows API is a broad set of functionality that governs the way that programs, including malware, can interact with the Microsoft libraries.

Hungarian Notation

Windows uses its own names to represent c value types, and some types like INT, SHORT and LONG are not used at all. We see this best in the windows registry where we have types like DWORD (32bit) and WORD(16bit) unsigned integers. Windows also uses Hungarian notation for API function identifiers. What this means is that a prefex is added to help identify the variable type such as dwSize as a parameter where the dw stands for DWORD.

Filesystem functions

One of the most common IoC’s for malware is when they create or modify files. When we are looking at an infection the file system and changes to the file system should be monitored and reviewed. We can uses some tools like Process Monitor for this. Some of the functions that can be called are;

  • CreateFile – This function is used to open files, pipes, streams and I/O devices as well as to create new files. The parameter dwCreationDisposition controls whether the call creats a new file or opens an existing one. Remember the dw prefix means ins a 32bit unsigned integer.
  • ReadFile and WriteFile – These functions are used for reading from and writing to files and operate on files as a stream. Both functions contain a parameter that signifies the number of bytes to read or write. That parameter controls the size of the data chunk that is read or written.
  • CreateFileMapping and MapViewOfFile – File mapping are commonly used by malware as they allow a file to be loaded into memory and manipulated. Creating a mapping loads a file from disk into memory while viewing a mapping returns a pointer to the base address of the mapping which can be used to access the file in memory. The program calling these function can use the pointer to read and write anywhere in the file.

The registry

The windows registry is used to store OS and program configuration information like settings or options. You are possibly familiar with it if you have used the regedit application. The registry is a hierarchical database of information and most windows configuration information is stored there including networking information, driver configuration, startup settings and useraccounts. The amount stored is quite vast and anyone looking for a career in IT should spend time going through it and learning the different categories. Knowing it is particularly helpful to malware analysts and malware uses the registry to gain persistance or to access or modify configuration data. The malware often adds entries into the registry that will allow it to run automatically when the computer boots.

The hierarchy

  • Root key – the registry has 5 top level sections called root keys or HKEY’s. Each root has a specific purpose or target.
  • Subkey – is a subfolder of the root key.
  • Key – is a folder in the registry that can contain additional folders or values. Root keys and sub keys are both “keys”.
  • Value entry – is an ordered name-value pair.
  • value or data – is the data stored in a registry entry.

Root keys

  • HKEY_LOCAL_MACHINE (HKLM) Stores settings that are global to the local machine.
  • HKEY_CURRENT_USER (HKCU) Stores settings specific to the current user.
  • HKEY_CLASSES_ROOT Stores information defining types.
  • HKEY_CURRENT_CONFIG Stores settings about the current hardware configuration, specifically differences between the current and standard configuration.
  • HKEY_USERS Defines setting s for the default user, new users and current users.

Common registry functions

Malware often uses registry functions that are part of the windows API in order to modify the registry to run automatically when the system boots. The most common registry functions used are;

  • RegOpenKeyEx – Opens a registry for editing and querying. There are functions that allow you to query and edit a registry key without opening it first, but most use RegOpenKeyEx.
  • RegSetValueEx – Adds a new value to the registry and sets it data.
  • RegGetValue – returns the data for a value entry in the registry.

These functions are used by malware and when we see them we should investigate the keys they are trying to access. There are also keys that will allow the malware to run at startup but many more deal with a systems security and settings so persistence may not be the only reason the malware is accessing the registry.

Networking API

Malware can also make use of network functions using API calls. There are many potential functions malware could take advantage of but of these option malware most commonly uses Berkeley Compatible Sockets. This allows malware to work across both windows and unix/linux systems. In windows this functionality is implemented in the Winsock Libraries, primarily in ws2_32.dll. Common functions include;

  • Socket – creates a socket.
  • Bind – Attaches a socket to a particular port, prior to accepting a call.
  • Listen – indicates that a socket will listen for incoming connections.
  • Accept – Opens a connection to a remote socket and accepts a connection
  • Connect – Opens a connection to a remote socket; the remote socket must be waiting for the connection.
  • Recv – Recieves data from the remote socket.
  • Send – Sends data to the remote socket.

The WSAStartup function must be called before any other networking functions in order to allocate resources for the networking libraries. When investigating network activity we need to consider both local and remote sides of the connection. The remote side usually maintains an open socket that is listening for incoming connections while the local side connects to that waiting port. Malware can use either side’s functionality and can act as a client(sending information to a C2) or a server (receiving instructions from a C2).

For a client side/local side application that connects to a remote socket we will see the socket call followed by the connect call, followed by send and recv.

For a server application/remote side that listens for incoming connections we will see a socket call, followed by a bind call, followed by a listen call and finally an accept call in this order. After this accept we will see send and recv calls.

Dynamic Link Libraries (DLL)

DLL’s are windows code libraries used by multiple applications. A DLL is an executable file that does not run alone but exports functions that can be used by other application. Static libraries still exist in windows but DLL’s are more common as they allow for code reuse and sharing. Sharing is possible as the single instance of the DLL loaded in memory can be accessed by multiple processes. #Optimized!

Unfortunatly Malware authors take advantage of DLLs in 3 primary ways;

  1. They use the basic Windows DLLs found on every system to interact with the OS. As malware analysts we can see what DLLs are being used by malware to gain an understanding for what is being achieved. With the static analysis we did previously we could get the functions called using strings.exe.
  2. They store malicious code in the DLL. Similar to how trojans work this can allow the author to attach their malware to multiple processes.
  3. Using third party DLLs the malware can also interact with other programs. We can see this when malware imports functions from a third party DLL and this can help us identify what the malwares goal is.

Processes

Malware can also execute code outside of whichever program it is in by creating new processes or modifying existing ones. In the past malware has been a stand alone process but now adays malware tries to avoid detection by running as part of another process, a process we trust. Windows uses processes as containers to manage resources and keep separate programs from interfering with each other. Malware can use CreateProcess to spawn a new process. With CreateProcess malware has a high degree of control over how the new process is created. For example the malware could use this to create a process to execute malicious code, or to create an instance of Internet Explorer(They still exist!) and direct the browser to access malicious sites and content. The most common thing for malware to do with a new process is to create a simple remote shell that the malware author can use to gain access to the machine without any other tools. Scary.

From endgame.com

Summary

It has taken us a few months but our series on Malware Analysis has finally come to an end. Its been an incredible journey for me growing from having no knowledge to where i am now. Through these blog posts we should now understand what IoC’s can suggest a malware infection, how to safely carry out static and dynamic analysis(including the tools to use!), how malware can hide from detection(and why signature based antivirus’ are no longer sufficient) and finally we spent a lot of time going through reverse engineering, expanding our under standing of assembly and the most common instructions we will see.

At this point if you have been following the blog you, like me, should now have strong entry level understanding of malware analysis and how malware analysts carry out their day to day jobs. From here the next step is practice. Grab malware samples, put them onto your lab and start analysis them and trying to reverse engineer them to understand what the code is doing. This is the only way to get familiar with the concepts and who knows, you might end up like Marcus Hutchins and stop the spread of the next WannaCry.

Best of luck!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s