Scrammed!: Why Flame is a pain to analyze - a look at its intricate compilation style.

Introduction

This post is about some peculiarities of the assembly code of Flame, the malware infiltrating Iranian computers. Note that I'm not going to give you any additional detail, or new issues about its analysis; if you are interested in this kind of stuff I suggest you to read the report written by CrySyS, that is by far the most comprehensive available description of its different components.

Aside from that, it should be noted that although the main functionalities of Flame have been identified, there's still a lot of undocumented code. So I hope that, for those of you who want to perform their own analysis, it will be helpful to understand more about its compilation style, and that's why I'm writing these little notes.

In order to do that I decided to discuss a specific routine in the "advnetcfg.ocx" file: the RC4 encryption routine. In particular, I focused on the attempt to retrieve the key.
Although I'm not the first one to find it, as it appears also in the CrySyS report cited above (without describing the procedure), the scope of this post is to show you how a standard task like that is made intricate and time-consuming by the compilation style.

This is only an example to highlight such a kind of structured code, as you will find it all over the malware. Of course, this isn't the only peculiarity that makes its code more difficult to understand: maybe there will be a sequel to continue this discussion.

First, we will describe how to deal with the RC4 algorithm in order to identify which parameter is used for the key but, even knowing that, it won't be enough for finding its content directly and we will be going through some intricate code to finally reveal its value.

Let's get it started.


Analyzing RC4

Giving a look at the code, we notice the following loop:

.text:1002598F mov [eax+ecx], al

.text:10025992 inc eax

.text:10025993 cmp eax, 100h

.text:10025998 jl short loc_1002598F

It is a typical hint to recognize the RC4 algorithm, as it composes a 0x100 (= 256 dec) bytes array, that is the initial permutation box. Just compare it to one of the RC4 source codes available online (this, for instance), and look for the Assembly-C correspondence:

for (i = 0; i < 256; i++)
state->perm[i] = (u_char)i;


Then we can see another clear sign of RC4:

.text:1002599C   mov [ecx+100h], dl
.text:100259A2   mov [ecx+101h], dl


It obviously refers to:

state->index1 = 0;
state->index2 = 0;


Putting these lines together we get the RC4 "state" structure, which belongs to the "rc4_init" function. You can also notice that the "rc4_crypt" function is reported in the following lines, as probably the code was just copied from a source similar to the one we are referring to.

We also know that the prototype of the "rc4_init" function is:

void rc4_init(struct rc4_state *const state, const u_char *key, int keylen);

But in the assembly code we see only two parameters:

.text:10025986 arg_0 = dword ptr 8

.text:10025986 arg_4 = dword ptr 0Ch

This is weird! It means that one of them is missing: why? For the moment let's just say that the answer is related to the intricate nature of the code that I will clarify later.

First let's look for the code that uses the key. In the C code we have:

j += state->perm[i] + key[i % keylen];


We are interested in finding an Assembly correspondence for the last addendum:

.text:100259DA   idiv [ebp+arg_4]


This tells us that arg_4 is the key length. Moreover:

.text:100259DD   inc [ebp+var_8]
.text:100259E0 cmp [ebp+var_8], 100h
.text:100259E7 jl short loc_100259AF


So, var_8 in the Assembly code is the counter i in the C code, and to find the key we have to look for an Assembly instruction reading one byte from the memory. This consideration leads us to:

mov bl, [esi+edi]


We are indeed interested in edi that comes from arg_0:

.text:100259B2   mov edi, [ebp+arg_0]


that is... the key!
Well, here we are... we found the key... but are we done? Usually the answer would be "yes", but in this case there's more work to do and this is where the code becomes intricate.


Tracking the key

Now we know that the key is passed to the "rc4_init" function as the first argument and we want to track it back to see its content. So, we follow the code using the Cross References and notice that eax corresponds to arg_0, as it is pushed right before the call to "rc4_init":

.text:1000E69F call get_key_object

.text:1000E6A4 push eax

.text:1000E6A5 lea ecx, [esi+4]

.text:1000E6A8 call rc4_init

What about eax?

It comes from the "get_key_object" call, from which we get:

.text:1000C537   mov eax, [ecx+4]
.text:1000C53A   mov eax, [eax+0Ch]
.text:1000C53D   add eax, [ecx+8]
.text:1000C540   retn

A little remark: as a convention, the C++ "this" pointer is stored in the ecx registry. If you are interested in reversing C++ applications you should read this paper as a starting point. More info about the "this" pointer can be found here.

Basically, the code above reads a pointer and then adds something to it, leading to the final pointer to the key. In particular, you can picture the whole code as "memory buffer" object, that contains a pointer to the data and an index to access it.
Something like this:


   this
   +---------------+
00 | ... | Obj_data
   +---------------+ +---------------+
04 | ptr Obj_data | ---> | ... | 00
   +---------------+ +---------------+
08 | Index | | ... | 04
   +---------------+ +---------------+
   | ... | | ... | 08 Key
   +---------------+ +--+
   | ptr byte Key | 0C ---> | | 0
   +---------------+ +--+
   | ... | | | 1
   +--+
   |..| 2

Now we have to follow ecx before "get_key_object" is called, and we see:

.text:1000E69C   lea ecx, [ebp+var_20]


So, we want to investigate when "var_20" is filled with a value.

.text:1000E67F mov esi, ecx

.text:1000E681 push [ebp+arg0]

.text:1000E684 lea eax, [ebp+var_20]

.text:1000E687 lea ebx, [esi+108h]

.text:1000E68D push eax

.text:1000E68E   call key_from_arg0?


From the code above we may think that the key is passed through arg0, but if we try to follow arg0 via Cross Reference we don't go very far:

.text:1000E5CE   push   0

.text:1000E5D0 lea eax, [ebp+var_20]

.text:1000E5D3 push eax

.text:1000E5D4 xor ebx, ebx

.text:1000E5D6 call instantiate_object

.text:1000E5DB mov byte ptr [ebp+var_4], 2

.text:1000E5DF push eax

.text:1000E5E0 mov ecx, esi

.text:1000E5E2   call   do_rc4


arg0 is the first parameter of the function we were in, before the Cross Reference, let's call it "do_rc4"; so we have to follow eax, that is the return value of the "instantiate_object" function. This call takes 0 and var_20 as its parameters and returns an empty object.

Dead point, indeed... or maybe not! Let's reconsider the parameters passed to the "key_from_arg0?" function: maybe the parameter we are interested in isn't passed via stack, but via register... Maybe the missing piece is the instruction:

.text:1000E687 lea ebx, [esi+108h]

and we have to follow esi+108h instead of arg0!

At the top of the "do_rc4" function we notice:

.text:1000E67F mov esi, ecx


So, esi+108h is passed to the "do_rc4" function, via the "this" pointer.

Now let's follow back the cross reference; if we scroll up the code we notice:

.text:1000E5B2   push [ebp+p_key_bytes]
.text:1000E5B5   mov ebx, [ebp+arg_8]
.text:1000E5B8   lea eax, [esi+108h]
.text:1000E5BE   push eax

.text:1000E5BF mov dword ptr [esi], offset off_10073520

.text:1000E5C5   call instantiate_object


This totally makes sense! There is a second call to the "instantiate_object" function and this time its parameters are p_key_bytes and esi+108h. It makes us think that this function creates an object with the bytes of the key from p_key_bytes and puts its address in esi+108h.

Ok, here we go... Again! Recursive way to think: let's call "do_rc4_2" the function we are in and follow p_key_bytes via Cross Reference to see when it is filled.

.text:1000129A   lea ecx, [ebp+58h]
.text:1000129D   call get_key_object
.text:100012A2   push eax
.text:100012A3   lea eax, [ebp-1F4h]

.text:100012A9 push eax

.text:100012AA call do_rc4_2

"p_key_bytes" is the second parameter of "do_rc4_2" and to investigate its value we have to follow eax, that is... the return value of the "get_key_object" function we have already described. It reads an object from the address contained in ecx... that is... the one contained in ebp+58h! Really, really weird!
Why ebp+58h? Are there so many parameters on the stack?

In order to understand the situation properly, we have to go at the beginning of the function "do_rc4_2":

.text:10001230 push ebp

.text:10001231 sub esp, 48h

.text:10001234 mov eax, offset sub_1006A3CF

.text:10001239 call __EH_prolog

To skip some boring calculations, let's just say that "__EG_prolog" sets the value of ebp to esp-4. So, after the execution of these instructions, the stack will look like this:

... [prolog][48h bytes][ebp][ret_addr][param_1][param_2] ...

So:

prolog + 48h + ebp + ret_addr + param_1 = 4h + 48h + 4h +4h +4h = 58h

It sounds good! It means that the code points to param_2.

Once again... we call "go" the function we are in, and look for the "go" second parameter via Cross Reference.

.text:10003254 sub esp, 14h

.text:10003257 mov eax, esp

.text:10003259 mov [ebp+78h], esp

.text:1000325C push eax

.text:1000325D mov ebx, [ebp+68h]

.text:10003260 call do_newcopy_addref

.text:10003265 mov byte ptr [ebp-4], 2

.text:10003269 push dword_10091C08

.text:1000326F mov byte ptr [ebp-4], 1

.text:10003273 call go

And here comes the problem... we are looking for the second parameter, but there's only one push! Don't panic.

Let's give a look at the code: first it allocates memory on the stack, using the sub esp, 14h instruction, and then it calls the "do_newcopy_addref" function that copies something from the value at the address in ebp+68h to esp-14h (once again, ebp+68h is passed via register!).

So, we have to re-figure out what the stack looks like:

... [prolog][48h bytes][ebp][ret_addr][param_1][14h bytes object] ...

Basically, param_2 is a 14h bytes object.

This is unusual, as normally the code would have passed a pointer to the object instead of the object itself. This also makes the code more difficult to analyze because, in this way, IDA cannot recognize the parameter anymore.

We are almost done: let's focus on ebp+68h and try to track it back!

.text:1000323E push dword ptr [ebp+78h]
.text:10003241 lea eax, [ebp+68h]

.text:10003244 push eax

.text:10003245 call sub_1000346A

The reasoning is always the same: we see a function with two parameters, one of which is ebp+68h; so, we can suppose that the other one, that is ebp+78h, points to the bytes of the key and the function instantiates an object by making a copy from the key itself.

Now, we have to follow ebp+78h. It reminds us of the weird parameter ebp+58h we saw before... So, again, we go at the beginning of the function and notice:

.text:100031FA push ebp

.text:100031FB sub esp, 6Ch

.text:100031FE mov eax, offset loc_1006ACBC

.text:10003203 call __EH_prolog

This time the stack will look like this:

... [prolog][6Ch bytes][ebp][ret_addr][param_1][param_2] ...

and

prolog + 6Ch + ebp + retaddr = 4h + 6Ch + 4h + 4h = 78h

So, ebp+78h points to param_1.

Again, we go via Cross Reference to follow param_1 and see:

.text:100126FB push [ebp+arg_0]

.text:100126FE call sub_100031FA

arg0 is our target! Another first parameter to follow, another Cross Reference to see:

.text:100033BA push [ebp+arg_0]

.text:100033BD call sub_100126D5

But now we are in a very special function:

.text:100033A4 UpdateTBSList proc near

It is an export function, but even knowing that, it doesn't make us retrieve the key as it is not called from within the executable module itself...!

Here is a visual representation of the whole analysis we have done:

I hope this discussion has given you an idea of how much such a kind of structured code can make things complicated... although we went very deeply in the code to track the key back, even at the end of our analysis, we didn't find its value!

Are we close to it? Mmm... close enough at least :P

I'm not going to describe every single detail, but let's just think of the next logical step.

You may think about looking for the call to "UpdateTBSList" in the other components of Flame, but you won't find anything because the strings are encrypted! So, first you have to decrypt the strings and then you can look in every component of the malware to find where the export is called :)

But, even knowing that... once you have finally retrieved the key... what is it useful for? Was this time-consuming effort worth it?

Well, it definitely is but, to understand why, you should conduct further investigation... :) This "never ending task" makes us think of the direction malware analysis is taking in these years: lot of effort, lot of patience, lot of dedication is required to perform even a small analysis like that!

Scrammed!

Monday, June 11, 2012

Why Flame is a pain to analyze - a look at its intricate compilation style.

No comments:

Post a Comment