Thursday, October 25, 2012

Tricky Tilon: disappearing instruction, anti-debugging, deceptions and much more!


This post is about a sort of anti-debugging trick that I discovered while analyzing a malware named Tilon. Well, to be precise, it's more a deception trick than an anti-debugging one but, as we will see later, it's really easy to tweak it to tamper with debugging.

Tilon is a banker that has been spotted by Trusteer in July 2012 and, aside from some pretty standard stuffs, like a Man In The Browser implementation,  it's better known for the peculiarity of making use of several evasion techniques. I found one of them, in the attempt of digging deeper in its various encryption/packer layers, that hasn't been reported yet.

All the layers are very easy to bypass: they roughly consist of basic crypto operations and UPX compression. After solving them you will see the following listing:


008D0079   CALL 008D0233
008D007E   CALL 008D04DC
008D0083   POPAD
008D0084   ADD ESP,4
008D0087   MOV EAX,DWORD PTR SS:[EBP+4020BF]
008D008D   JMP EAX

Tilon's trick(s)

So, we have two CALL and one JMP.

If we step into the first call, we will notice that the malware is going to set an "hook" (it's not properly API hooking as it's only inside the process itself) on the KiUserExceptionDispatcher API (a function of NTDLL.dll that is being called when some types of exception occur) to call (what we will discover to be) a decryption routine.

Pay attention to the code while stepping... As I reminded you before, Tilon is famous for the number of the deception tricks implemented!
For example, the next call makes use of a well known PEB related anti-debugging trick:

008D04EB   JNZ SHORT 008D04EE
008D04ED   RETN
008D04EE   INT3

Finally, there is a jump that brings us to an encrypted code:

Access violation, indeed!
Since there is no Exception Handler installed, one may think that the code has crashed for some reason that he missed and he will restart the debugger to conduct a more precise analysis. On the other hand, if we use Shift+F8 the malware will decrypt the code because of the previous "hook"! Finally, it will also delete the "hook" from the KiUserExeceptionDispatcher API and jump to the decrypted bytes.

So, this is really not an anti-debugging trick, even if it will work fine in some cases, like against emulators, but, let's stay focused: our goal is to fool the debugger... How can we do it? Well, there are surely several ways to do that, the one I have in mind consists of mixing this trick, the PEB one and... one more finding!

Let me explain it briefly: when Tilon generates the exception, by pressing Shift+F8 the debugger will execute the first instruction of the "hook", but will break only on the second one.
Thus we have: 

7C91EAEC   68 FA028D00      PUSH 8D02FA ; the first instruction is executed...
7C91EAF1   C3               RETN ; but the debugger will break only here!

That gives us the possibility of hiding an instruction!

Putting the pieces together:

Tilon: you are doing it wrong...!

Now I will show you one way to improve the trick in the Tilon's code:

STEP 1. We need to change the decryption key with a wrong one and to set a global variable (for instance, 00BD0AB0) containing the address of the decryption routine (in my case 008D02FA).

STEP 2. We modify the hook by writing: JMP [008D0AB0](this instruction will be hidden).

STEP 3. We modify the PEB trick (inside the last call before the jump to the encrypted bytes; see the listing at the beginning of this blog-entry) in the following way:

008D0A00   MOV EAX,DWORD PTR FS:[18]
008D0A0D   JNZ SHORT 008D0A19
008D0A0F   MOV DWORD PTR DS:[8D0AB0], 8D0A1A  ; no debugger!
008D0A19   RETN
008D0A1A   MOV BYTE PTR DS:[8D036D],7A ; small snippet in case of no debugger
008D0A21   JMP 008D02FA

When the code will jump to the encrypted bytes, the exception will call the "hooked" KiUserExceptionDispatcher API, that:
  • will jump directly to 008D2FA (the decryption routine), using the wrong key, if the debugger is detected;
  • will jump to the small snippet in the listing above, that will restore the correct decryption key and then jump to 008D2FA, otherwise.

In this way, if an analyst doesn't notice the PEB control (stepping over its call, for instance), the bytes won't be decrypted in the right way and this will cause a crash.

In my opinion, this version of the trick is way better than Tilon's original implementation, but we can improve it much more if we chose not to consider how it was originally structured (that is strongly related to the exception caused by the execution of the encrypted bytes...).

Another way to implement the trick (my fav one ;))

Another variant of Tilon's original implementation of the trick is the following.

We have to set a global variable (let's say 00BD0AB0that contains an address memory, depending on the result of the PEB anti-debugging trick. Then, we need to generate an appropriate exception (for example, by reaching a null pointer) and to "hook" the KiUserDispatcher API by injecting a JMP 00BD0AB0 (the hidden instruction!).
Thus, we have something like:

mov eax, 0
mov [eax], 0
* junk code *

The idea is to use the PEB check to set the global variable 00BD0AB0 to the address of * junk code * if the debugger is revealed, and to set it to the right address (where the real code is) otherwise. In this way, the analyst may not notice the "hook" at all, but will use shift+F8 to continue its debugging from the instruction right after the exception.

The following diagram will clarify the procedure:

Of course you can choose a different (more subtle and less visible) way to generate the exception, but the thing is that you can really confuse the analysis using the * junk code * and this can be really time consuming from the analyst perspective. For instance, you can insert some junk code and then terminate the process, or anything else.

Moreover, it's less detectable than setting the jump directly in the PEB check, because of the hidden instruction and the fact that the debugging will continue its execution after the exception itself like you would normally expect.

Note also that the PEB check is only one of the possible tricks to detect the debugger and you can obviously chose a different one!


You can use this technique on its own, or mixing it with other tricks. In case you chose to combine different tricks together, the risk of being detected will increase... but so will the number of possible uses you can make!

Thursday, October 11, 2012

Some notes about the pdf exploits in Blackhole 2.0

Recently we have been hearing a lot about Blackhole 2.0, the last edition of the popular exploit kit, and so I started looking around to gather some more information. In particular, I searched for some websites hosting it and found out a pdf file that caught my attention (you can find it in <blackhole_host>/data/t.pdf). 

The curious thing about it is that it doesn't contain any malicious code and if we look closer we understand that it's only a sort of skeleton for the real malicious pdf.

In fact, just analyzing the raw bytes we see the following streams:

3 0 obj<<%data%/CreationDate(%title%)>>

42 0 obj<</Length 504/Filter[/FlateDecode]/Type/EmbeddedFile>>stream

43 0 obj<</Length 1313/Filter/FlateDecode/Type/EmbeddedFile>>stream

This suggests us that maybe the malicious pdf is built at runtime: it seems that the fields %data%, %title%, %config% and %js% are filled each time with data related to a different exploit, depending on the vulnerability found on the victim's system. Moreover, it is a novelty for the Blackhole exploit kit, as the other versions didn't make use of a similar approach.

So, I conducted further investigations, searched for some live exploit urls to perform a real infection and take a log with WireShark. I then extracted the pdf file from it and started analyzing it.

To do that I used an utility named PDFStreamDumper, that successfully decompresses the streams (note that some other alternatives, such as pdftk, failed in this attempt as maybe the file was intentionally corrupted in order to make the inspection more difficult).

The important streams are the same as the ones listed above, but in this case they are filled with some data (they are reported in a slightly different notation because I had to decompress them). Here they are, together with a brief explanation: 


/Keywords(3a3p3p1h3a3l3e3r40233e423e3n401h403a3r3g3e401h3c3r3e3a403i3o3n2a3a403e1h3r3e3p3l3a3c3e1b1i1f1i3g1f1a1a1c21423a3r133p3a3d3d3i3n3g21423a3r133b3b3b1f13… **ENCRYPTED EXPLOIT BYTES** …383j1l1b383l3l1l1c21433i403h1b473k20383l3l1m491c382f1j1b3k1c212f3m3a3g3e2c3i3e3l3d1k1h3r3a432s3a3l413e23383l3l1k49383j1m1b1c21)/CreationDate(6683e4fcfc85e47534e95f33c0648b40308b400c8b701c568b760833db668b5e3c0374332c81ee1510ffffb88b4030c346390675fb87342485e47551e9eb4c51568b753c8b74357803f5… **SHELLCODE BYTES** ...6363636d7477723d3033303333333034333430383335333830393035266c71786d746e66623d30332668657a6e647865663d746c796d6626717666707870656f3d75777462730000)

This stream contains both the encrypted javascript exploit and the shellcode.


<config xmlns="" xmlns:xfa=""><trace><area level="1" name="font"></area></trace><agent name="designer"><!--  [0..n]  --><destination>pdf</destination><pdf><!--  [0..n]  --><fontInfo></fontInfo></pdf></agent><present><!--  [0..n]  --><pdf><!--  [0..n]  --><fontInfo><embed>1</embed></fontInfo><version>1.6</version><creator>Adobe Designer 7.0</creator><producer>Adobe Designer 7.0</producer><scriptModel>XFA</scriptModel><interactive>1</interactive><tagged>1</tagged><compression><level>6</level><compressLogicalStructure>1</compressLogicalStructure></compression></pdf><xdp><packets>*</packets></xdp><destination>pdf</destination></present><acrobat><acrobat7><dynamicRender>forbidden</dynamicRender></acrobat7><common><locale></locale><data><incrementalLoad></incrementalLoad><adjustData></adjustData><xsl><uri></uri></xsl><outputXSL><uri></uri></outputXSL></data><template><base>C:\</base><relevant></relevant><uri></uri></template></common></acrobat></config>

This stream contains some xml data.


<!--&lt;template>--><template><subform layout="tb" locale="ru_RU" name="form1"><pageSet><pageArea id="Page1" name="Page1"><contentArea h="10.5in" w="8in" x="0.25in" y="0.25in"></contentArea><medium long="11in" short="8.5in" stock="letter"></medium></pageArea></pageSet><subform h="10.5in" w="8in"><field h="98.425mm" name="ImageField1" w="28.575mm" x="95.25mm" y="19.05mm"><ui><imageEdit></imageEdit></ui><caption placement="bottom" reserve="5mm"><font typeface="Myriad Pro"></font><para vAlign="middle"></para><value><text>Image Field</text></value></caption><border xmlns=""><edge presence="hidden"></edge><edge stroke="dotted"></edge><edge stroke="dotted"></edge><edge stroke="dashed"></edge><corner stroke="dotted"></corner><corner stroke="dotted"></corner><corner stroke="dashed"></corner><fill><pattern type="crossDiagonal"></pattern></fill></border><event xmlns:xfa="" activity="initialize">
<xfa:script contentType='application/x-javascript'>
</xfa:script></event></field></subform><proto></proto></subform><?templateDesigner DefaultLanguage FormCalc?><?templateDesigner DefaultRunAt client?><?templateDesigner Grid show:1, snap:1, units:0, color:ff8080, origin:(0,0), interval:(125000,125000)?><?templateDesigner Rulers horizontal:1, vertical:1, guidelines:1, crosshairs:0?><?templateDesigner Zoom 76?></template>

This stream contains the script that decrypts the exploit itself.

To decrypt the exploit, you can use the following html page ("z" contains the encrypted bytes):


<title>Decrypted Exploit</title>

<script language="javascript">

var z;
var s;
z = "3a3p3p1h3a3l3e3r40233e423e3n401h403a3r3g3e401h3c3r3e3a403i3o3n2a3a403e1h3r3e3p3l3a3c3e1b1i1f1i3g1f1a1a1c21423a3r133p3a3d3d3i3n3g21423a3r133b3b3b1f13… **ENCRYPTED EXPLOIT BYTES** …383j1l1b383l3l1l1c21433i403h1b473k20383l3l1m491c382f1j1b3k1c212f3m3a3g3e2c3i3e3l3d1k1h3r3a432s3a3l413e23383l3l1k49383j1m1b1c21";
s = "";
for(i=0; i < z.length; i+=2)
  document.write(String.fromCharCode(parseInt(z[i]+z[1+i], 0x1d)));
  if(String.fromCharCode(parseInt(z[i]+z[1+i], 0x1d)) == ';' )



Which leads to the following well known vulnerability (CVE-2010-0188):

  _j8='SUkqADggAABB'; // * base64 representation of a TIFF header! *

I also gathered some other malicious pdf files and found out that they are structured always in the same way: the decryption script may change a little (for example, I found "0x1C" instead of "0x1D", that is the numerical base employed to interpret the bytes), but the method itself will be very similar.

Saturday, September 29, 2012

Cleaning off anti-disassembly code: the IDC way.

Every code that can be executed, can also be reversed. Although, there are some tricks to make this task: more time-consuming; more intricate; more difficult to be turned into an automated analysis.

What's the problem? A sequence of executable code can be disassembled in many different ways, so disassemblers have to use some heuristics that, for their nature, are subject to limitations. Anti-Disassembly techniques take advantage of them!

When an executable code is disassembled, each byte of it occurs in the representation of one, and only one, instruction at the time. So, if the disassembler is forced to make an instruction starting from the wrong offset, the instruction shown by it won't match the one being executed.

In this article I will discuss one of these tricks. The trick itself is not really a big deal as it's a well known one, but I found it to be very annoying as I had to deal with it recently, in the attempt of reversing a malware. For this reason, I decided to automate the task of cleaning off the code and developed a little IDC script, based on some assumptions I'm going to explain.

Let's consider a code in the following form:

call loc_40106A
401050: db 'string01', 0
40106A: * assembly instructions *
......: .....

It will be disassembled as:

call loc_401050+0A
401050: * assembly instructions corresponding to the string interpreted as a code *
40106A: * assembly instructions *
......: .....

(This is an approximate representation: note that the bytes of the string interpreted as opcodes may unalign the instruction at 40106A)

The idea behind the trick is very simple: IDA is unable to distinguish between text and code and makes wrong assumptions while disassembling the executable. In particular, IDA doesn't realize that, following the control-flow, the bytes it interpreted as code (right after the call instruction) are never executed and, thus, they are only text.

Moreover, it is very disturbing from the analyst perspective as it basically hides strings, making it useless to search for them, and also results in some weird disassembled instructions, that complicate the listing.

So, what's the idea to clean off the code?

First, we have to search for a short call, that corresponds to the "E8" opcode and we assume that its operand, that is the following 4 bytes, will have a value between 1 and 100.

Once we find a similar situation, we undefine the bytes of code corresponding to the string with "MakeUnkwown", and then we use "MakeStr" to recompose the original string correctly.

Unfortunatly, even if this procedure will solve most of the cases, it isn't general enough to solve all of them.

Let's consider the following example:

seg000:00403E2A E8 09 00 00 00                          call    near ptr loc_403E37+1
seg000:00403E2F 69 64 65 6E 74 69 74 79                 imul    esp, [ebp+6Eh], 79746974h
seg000:00403E37                         loc_403E37:                            
seg000:00403E37 00 57 8D                                add     [edi-73h], dl
seg000:00403E3A 83 8D 28 40 00 FF D0                    or      dword ptr [ebp-0FFBFD8h], 0FFFFFFD0h
seg000:00403E41 83 C7 08                                add     edi, 8
seg000:00403E44 2B CF                                   sub     ecx, edi

After the procedure described above, it will become:

seg000:00403E2A E8 09 00 00 00                          call    near ptr unk_403E38
seg000:00403E2F 69 64 65 6E 74 69 74 79+aIdentity       db 'identity',0
seg000:00403E38 57                      unk_403E38      db  57h 
seg000:00403E39 8D                                      db  8Dh 
seg000:00403E3A 83 8D 28 40 00 FF D0                    or      dword ptr [ebp-0FFBFD8h], 0FFFFFFD0h
seg000:00403E41 83 C7 08                                add     edi, 8
seg000:00403E44 2B CF                                   sub     ecx, edi

As you can see, IDA has also undefined some bytes after the string (which should have been legitimate assembly instructions); so, to solve the problem, we may think to go at the end of it and reconvert everything in code, using "MakeCode". But is that enough?

seg000:00403E2A E8 09 00 00 00                          call    loc_403E38
seg000:00403E2F 69 64 65 6E 74 69 74 79+aIdentity       db 'identity',0
seg000:00403E38                         loc_403E38:                             
seg000:00403E38 57                                      push    edi
seg000:00403E39 8D                                      db  8Dh 
seg000:00403E3A 83 8D 28 40 00 FF D0                    or      dword ptr [ebp-0FFBFD8h], 0FFFFFFD0h
seg000:00403E41 83 C7 08                                add     edi, 8
seg000:00403E44 2B CF                                   sub     ecx, edi

No, it isn't! IDA tries to convert the bytes into code, but if they are unaligned it might do that starting from the wrong offset or even fail to complete the task. This happens because IDA originally disassembled the string bytes as code, which led to the disassembly of the following bytes in incorrect instructions like the above "or  dword ptr [ebp-0FFBFD8h]". Because of this, when you try to realign the code, and assemble the correct instruction starting from the byte "8D", IDA fails because the opcode "8D" needs to take the bytes from the "or" instruction, and IDA won't break an instruction that already exists.

Even undefining some of the bytes after the string and then trying to translate them back into code doesn't work because, for the nature of the problem, you don't know exacly how many bytes is better to consider to do this task. You can make some heuristic and try with about ten bytes, but this solution doesn't always give accurate results. Moreover, if there's another call in those ten bytes the things go even worse!

The basic idea to try to solve the problem is to manually parse the raw bytes. I wrote a little IDC script to do that; here is a brief explanation of how it works:

  • It searches for the "0xE8" byte and takes its operand, that is the size of the string.
  • It undefines the subsequent "size" bytes and recomposes them as a string.
  • Then, it iterates the following procedure:

    1. It tries to undefine a byte and to make an instruction from it (after the first execution of this step, the undefining operation won't have any effect in case of step 3.).

    2. If the instruction is made, go back to 1. and continue with the bytes after the instruction.

    3. If the instruction wasn't made, undefine one more byte (to a maximum of 16), in the attempt at making it, and repeat from step 1.

In this way, only one byte at a time is undefined and so is the building of the corresponding instruction, whenever possible.

I also assumed that if there are four subsequent instructions that are already interpreted as code, then the bytes are aligned and the work is done. Finally, if any of the recomposed instructions is a call, the algorithm will start over again.

Here is the final script (change "MIN" and "MAX" with the addresses that define the range of the code in which you want to run the script - e.g. MIN = 0x00401000, MAX = 0x00404FFF):

auto i, j;
auto Size;
auto Delta, DeltaTemp, DeltaUndef;

for(i = MIN; i < MAX; i++)
   if(Byte(i) == 0xE8)
       Size = Dword(i + 1);
       if(Size > 1 && Size < 100)
            Message(" %08x \n ", i);
            MakeUnknown(i+5, Size,  DOUNK_DELNAMES);
            MakeStr(i+5, i+5+Size);
            Delta = 0;
            DeltaUndef = 0;
            for(j = 0; j < 4; j++)
                 if(Byte(i+5+Size + Delta) == 0xE8)
                 MakeUnknown(i+5+Size + Delta, 1, DOUNK_DELNAMES);
                 DeltaTemp = MakeCode(i+5+Size + Delta);
                 if(DeltaTemp != 0)
                     Delta = Delta + DeltaTemp;
                     DeltaUndef = 0;
                    j = 0;
                    if(DeltaUndef > 16)
                    if(Byte(i+5+Size + Delta + DeltaUndef) == 0xE8)
                    MakeUnknown(i+5+Size + Delta + DeltaUndef, 1, DOUNK_DELNAMES);
                    Message("-- Undef %08x \n", i+5+Size + Delta + DeltaUndef);

P.S. I know the code looks twisty and more complicated than it needs to be, but I had to write it this way to bypass some glitches I was having with a couple of IDC APIs.

Monday, August 27, 2012

Analysing CVE-2012-4681 (latest Java 0day)

Yesterday I spotted the news about a new Java 0day being exploited in the wild and soon after a POC was released:
I decided to analyse this code to understand what is the vulnerability that triggers the exploit. Here is a brief description of my findings.

The code instantiates a Statement object that will be used to run the setSecurityManager() method of the System class. The purpose is to set the Security Manager to null, which means escaping the Java sandbox. Of course, you can't do this directly and here comes the exploit!

The Statement object contains a field named "acc", which is a AccessControlContext (a sort of security descriptor) that specifies the permissions allowed for the Statement object itself. This field is normally not accessible from the code outside the Statement class, so the exploit needs to find a way to modify it.

It does so by using the getField() method of the sun.awt.SunToolkit object: this function returns a given field from a given object; in this case it returns Statement.acc. At this point the game is over because the malicious code can just create a new AccessControlContext object, assign to it full permissions and then replace the old restricted Statement.acc with the new unrestricted one.

Mistery solved? Not yet: the tricky part is in obtaining an instance of the object sun.awt.SunToolkit, that is supposed to be a restricted package. The exploit does this by calling Class.forName(); this method simply returns an object from its name.

This is how I understand the code (and I'm no Java expert), but I read this blog entry that has a slightly different explanation. In their analysis, the authors see another method that accomplishes the task:  com.sun.beans.finder.ClassFinder.
I don't know what this is about: do they have a different POC or sample? It does seem so!

Also they say that the exploit itself relies on the possibility of instantiating the sun.awt.SunToolkit object through the com.sun.beans.finder.ClassFinder object. This would mean that in the POC I have analysed the vulnerability is in the Class.forName() method, that is, there are TWO different vulnerabilities (one in ClassFinder and one in Class.forName()).

However, debugging the exploit in Java version 1.6 (jre6) it did not work: the Class.forName() object successfully instantiated the sun.awt.SunToolkit object, but then the use of its getField() method threw an exception. Instead, the method works fine in version 1.7 (jre7). To make it short:

So, even if version 1.6 allows the instantiation of the sun.awt.SunToolkit object, it prevents it from accessing the private Statement.acc field, which seems correct. It seems that the bug is really in version 1.7, in the access to the Statement.acc. Or maybe none of the two is supposed to happen: sun.awt.SunToolkit must not be instantiated to restricted code, and the Statement.acc field must not be accessed by anyone.

I will look forward to new results.

*UPDATE* [28 August 2012]
1) Now we can refer to the above vulnerability as "CVE-2012-4681".
2) A new analysis, based on the same POC I documented, has been published today: . So, yes, it seems that getField() is the culprit, or at least it's one of them...

*UPDATE 2* [28 August 2012]
A more in-depth analysis is finally out:
As I though there are two different 0days: one that allows you to get a reference to the restricted class sun.awt.SunToolkit, and the other one (getField()) that lets you access a private field of a class. The missing detail (classFinder()) is also solved: it is used in the internal implementation of the execute() method of the Expression object.

Friday, August 17, 2012

FinFish's trick... not so legendary!

This post is about a trick that Finfish uses to appear (well, at least, "to try to appear"!) as a normal, non malicious program. 
First of all you can immediately notice that this sample is a simple loader: you can have a look at the IDA navigation bar to spot a tiny code section in contrast to a huge resource section. 

This tells us that something is hidden somewhere in the resources. The payloads, in fact, are encrypted and stored in the dialog type resources. 
Here's a quick verification test, that shows that something is wrong in the dialog data: 

But let's go back to the curious trick we mentioned, and let's begin by analyzing the code. If we start looking from the entry point we notice... absolutely nothing! 
At a first glance nothing suggests that we are analyzing a malware, as we only go through some common APIs.

Of course, a deeper reading reveals the trick: in the middle of some legitimate calls we find a suspicious function. The thing that more captured my attention is that it makes use of the VirtualProtect API different times, apparently without any good reason, as we will see later. 

For now let's start from the beginning:

.text:004011F5                 push    0               ; lpModuleName
.text:004011F7                 call    ds:GetModuleHandleW
.text:004011FD                 mov     ebp, eax        ; MZ header
.text:004011FF                 mov     eax, [ebp+3Ch]  ; MZ.elfanew = PE offset
.text:00401202                 mov     esi, [eax+ebp+80h] ; import table RVA
.text:00401209                 mov     eax, [esi+ebp+0Ch] ; import name RVA

This code gets the handle of the application itself and then it reads: the MZ header; the PE offset; the import table RVA; the first import name RVA.

.text:0040120D                 add     esi, ebp ; virtual address of the image import descriptor
.text:00401223                 add     eax, ebp
.text:00401225                 push    offset aUser32_dll_0 ; "user32.dll"
.text:0040122A                 push    eax             ; char *
.text:0040122B                 call    __stricmp
.text:00401230                 add     esp, 8
.text:00401237                 add     esi, 14h
.text:0040123A                 mov     [esp+18h+var_4], esi
.text:0040123E                 jmp     loc_4012E6
.text:004012E6                 mov     eax, [esi+0Ch]
.text:004012E9                 test    eax, eax
.text:004012EB                 jnz     loc_401223

Then the malware calculates the virtual addresses of the first image import descriptor, its import name address, and begins a loop over the import names looking for "user32.dll".

.text:00401243                 mov     edi, [esi]
.text:00401245                 mov     esi, [esi+10h]
.text:00401248                 mov     eax, [edi+ebp]
.text:0040124B                 add     edi, ebp
.text:0040124D                 add     esi, ebp
.text:00401257                 jmp     short loc_401260
.text:00401260                 lea     ecx, [eax+ebp+2] ; Name
.text:00401264                 push    offset aRegisterclasse ; "RegisterClassExW"
.text:00401269                 push    ecx             ; char *
.text:0040126A                 call    __stricmp
.text:0040126F                 add     esp, 8
.text:00401272                 test    eax, eax
.text:00401274                 jnz     short loc_401297
.text:00401297                 mov     edx, [edi]
.text:00401299                 lea     eax, [edx+ebp+2]
.text:0040129D                 push    offset aCreatewindowex ; "CreateWindowExW"
.text:004012A2                 push    eax             ; char *
.text:004012A3                 call    __stricmp
.text:004012A8                 add     esp, 8
.text:004012AB                 test    eax, eax
.text:004012AD                 jnz     short loc_4012D0
.text:004012D0                 mov     eax, [edi+4]
.text:004012D3                 add     edi, 4
.text:004012D6                 add     esi, 4
.text:004012D9                 test    eax, eax
.text:004012DB                 jnz     short loc_401260

Here the code saves the content of the OriginalFirstThunk and the FirstThunk fields of the IMAGE_IMPORT_DESCRIPTOR. Then, it loops over every IMAGE_IMPORT_BY_NAME.Name looking for the RegisterClassExW and the CreateWindowExW APIs.

Once they are found it does the following:


.text:00401276                 lea     edx, [esp+18h+flOldProtect]
.text:0040127A                 push    edx             ; lpflOldProtect
.text:0040127B                 push    40h             ; flNewProtect
.text:0040127D                 push    4               ; dwSize
.text:0040127F                 push    esi             ; lpAddress
.text:00401280                 call    ebx ; VirtualProtect
.text:00401282                 lea     eax, [esp+18h+flOldProtect]
.text:00401286                 push    eax             ; lpflOldProtect
.text:00401287                 mov     dword ptr [esi], offset BadFunc1 ; FirstThunk overwrite
.text:0040128D                 mov     ecx, [esp+1Ch+flOldProtect]
.text:00401291                 push    ecx             ; flNewProtect
.text:00401292                 push    4               ; dwSize
.text:00401294                 push    esi             ; lpAddress
.text:00401295                 call    ebx ; VirtualProtect


.text:004012AF                 lea     ecx, [esp+18h+flOldProtect]
.text:004012B3                 push    ecx             ; lpflOldProtect
.text:004012B4                 push    40h             ; flNewProtect
.text:004012B6                 push    4               ; dwSize
.text:004012B8                 push    esi             ; lpAddress
.text:004012B9                 call    ebx ; VirtualProtect
.text:004012BB                 lea     edx, [esp+18h+flOldProtect]
.text:004012BF                 push    edx             ; lpflOldProtect
.text:004012C0                 mov     dword ptr [esi], offset BadFunc2 ; FirstThunk overwrite
.text:004012C6                 mov     eax, [esp+1Ch+flOldProtect]
.text:004012CA                 push    eax             ; flNewProtect
.text:004012CB                 push    4               ; dwSize
.text:004012CD                 push    esi             ; lpAddress
.text:004012CE                 call    ebx ; VirtualProtect

Basically, it changes the protection of the memory containing the import addresses, using the VirtualProtect API; then it overwrites the FirstThunk entry, related to the RegisterClassExW and CreateWindowExW APIs, with a malicious offset.

In this way, every time one of these APIs is called it won't be executed and, instead, the code located at the malicious offset will be run. Even debugging the code, if we don't step into the calls, nothing will suggest that the code is being hijacked.

As we can see the ones above seem to be normal, legitimate, calls, but they are really hijacked to the malicious routines. And here is the trick in action in the debugger:

Note that this is not API hooking, but only a simple trick that works in the executable itself: it's not the API code being overwritten, it is the FirstThunk of the malicious executable.

What can I say... It's not a very advanced deception trick, but a curious one at least: come on guys, you can do better!

Monday, June 11, 2012

Why Flame is a pain to analyze - a look at its intricate compilation style.

This post is about some peculiarities of the assembly code of Flame, the malware infiltrating Iranian computers. Note that I'm not going to give you any additional detail, or new issues about its analysis; if you are interested in this kind of stuff I suggest you to read the report written by CrySyS, that is by far the most comprehensive available description of its different components.
Aside from that, it should be noted that although the main functionalities of Flame have been identified, there's still a lot of undocumented code. So I hope that, for those of you who want to perform their own analysis, it will be helpful to understand more about its compilation style, and that's why I'm writing these little notes.
In order to do that I decided to discuss a specific routine in the "advnetcfg.ocx" file: the RC4 encryption routine. In particular, I focused on the attempt to retrieve the key.
Although I'm not the first one to find it, as it appears also in the CrySyS report cited above (without describing the procedure), the scope of this post is to show you how a standard task like that is made intricate and time-consuming by the compilation style.
This is only an example to highlight such a kind of structured code, as you will find it all over the malware. Of course, this isn't the only peculiarity that makes its code more difficult to understand: maybe there will be a sequel to continue this discussion.
First, we will describe how to deal with the RC4 algorithm in order to identify which parameter is used for the key but, even knowing that, it won't be enough for finding its content directly and we will be going through some intricate code to finally reveal its value.
Let's get it started.
Analyzing RC4
Giving a look at the code, we notice the following loop:   
 .text:1002598F                 mov     [eax+ecx], al
.text:10025992                 inc     eax
.text:10025993                 cmp     eax, 100h
.text:10025998                 jl      short loc_1002598F
It is a typical hint to recognize the RC4 algorithm, as it composes a 0x100 (= 256 dec) bytes array, that is the initial permutation box. Just compare it to one of the RC4 source codes available online (this, for instance), and look for the Assembly-C correspondence:

for (i = 0; i < 256; i++)
state->perm[i] = (u_char)i;
Then we can see another clear sign of RC4:   
.text:1002599C                 mov     [ecx+100h], dl 
.text:100259A2                 mov     [ecx+101h], dl
It obviously refers to:
state->index1 = 0
state->index2 = 0; 
Putting these lines together we get the RC4 "state" structure, which belongs to the "rc4_init" function. You can also notice that the "rc4_crypt" function is reported in the following lines, as probably the code was just copied from a source similar to the one we are referring to.
We also know that the prototype of the "rc4_init" function is:
void rc4_init(struct rc4_state *const state, const u_char *key, int keylen);
But in the assembly code we see only two parameters:

.text:10025986 arg_0           = dword ptr  8
.text:10025986 arg_4           = dword ptr  0Ch
This is weird! It means that one of them is missing: why? For the moment let's just say that the answer is related to the intricate nature of the code that I will clarify later.
First let's look for the code that uses the key. In the C code we have:
j += state->perm[i] + key[i % keylen];
We are interested in finding an Assembly correspondence for the last addendum:
.text:100259DA                 idiv    [ebp+arg_4]
This tells us that arg_4 is the key length. Moreover:
.text:100259DD                 inc     [ebp+var_8]
.text:100259E0                 cmp     [ebp+var_8], 100
.text:100259E7                 jl      short loc_100259AF
So, var_8 in the Assembly code is the counter i in the C code, and to find the key we have to look for an Assembly instruction reading one byte from the memory. This consideration leads us to:
mov     bl, [esi+edi]
We are indeed interested in edi that comes from arg_0:

.text:100259B2                 mov     edi, [ebp+arg_0]
that is... the key!
Well, here we are... we found the key... but are we done? Usually the answer would be "yes", but in this case there's more work to do and this is where the code becomes intricate.
Tracking the key

Now we know that the key is passed to the "rc4_init" function as the first argument and we want to track it back to see its content. So, we follow the code using the Cross References and notice that eax corresponds to arg_0, as it is pushed right before the call to "rc4_init":
.text:1000E69F                 call    get_key_object
.text:1000E6A4                 push    eax
.text:1000E6A5                 lea     ecx, [esi+4]
.text:1000E6A8                 call    rc4_init

What about eax?
It comes from the "get_key_object" call, from which we get:
.text:1000C537                 mov     eax, [ecx+4]
.text:1000C53A                 mov     eax, [eax+0Ch] 
.text:1000C53D                 add     eax, [ecx+8]
.text:1000C540                 retn

A little remark: as a convention, the C++ "this" pointer is stored in the ecx registry. If you are interested in reversing C++ applications you should read this paper as a starting point. More info about the "this" pointer can be found here.
Basically, the code above reads a pointer and then adds something to it, leading to the final pointer to the key. In particular, you can picture the whole code as "memory buffer" object, that contains a pointer to the data and an index to access it.
Something like this:

 00 |    ...        |          Obj_data
    +---------------+      +---------------+
 04 | ptr Obj_data  | ---> |     ...       | 00
    +---------------+      +---------------+
 08 |   Index       |      |     ...       | 04
    +---------------+      +---------------+
    |    ...        |      |     ...       | 08      Key
                           +---------------+         +--+
                           | ptr byte Key  | 0C ---> |  | 0
                           +---------------+         +--+
                           |    ...        |         |  | 1
                                                     |..| 2

Now we have to follow ecx before "get_key_object" is called, and we see:
.text:1000E69C                 lea     ecx, [ebp+var_20]
So, we want to investigate when "var_20" is filled with a value.
.text:1000E67F                 mov     esi, ecx
.text:1000E681                 push    [ebp+arg0]
.text:1000E684                 lea     eax, [ebp+var_20]
.text:1000E687                 lea     ebx, [esi+108h]
.text:1000E68D                 push    eax
.text:1000E68E                 call    key_from_arg0?
From the code above we may think that the key is passed through arg0, but if we try to follow arg0 via Cross Reference we don't go very far:
.text:1000E5CE                 push    0              
.text:1000E5D0                 lea     eax, [ebp+var_20]
.text:1000E5D3                 push    eax             
.text:1000E5D4                 xor     ebx, ebx
.text:1000E5D6                 call    instantiate_object
.text:1000E5DB                 mov     byte ptr [ebp+var_4], 2
.text:1000E5DF                 push    eax             
.text:1000E5E0                 mov     ecx, esi
.text:1000E5E2                 call    do_rc4
arg0 is the first parameter of the function we were in, before the Cross Reference, let's call it "do_rc4"; so we have to follow eax, that is the return value of the "instantiate_object" function. This call takes 0 and var_20 as its parameters and returns an empty object.

Dead point, indeed... or maybe not! Let's reconsider the parameters passed to the "key_from_arg0?" function: maybe the parameter we are interested in isn't passed via stack, but via register... Maybe the missing piece is the instruction:
.text:1000E687                 lea     ebx, [esi+108h]

and we have to follow esi+108h instead of arg0!
At the top of the "do_rc4" function we notice:
.text:1000E67F                 mov     esi, ecx
So, esi+108h is passed to the "do_rc4" function, via the "this" pointer.
Now let's follow back the cross reference; if we scroll up the code we notice:
.text:1000E5B2                 push    [ebp+p_key_bytes]
.text:1000E5B5                 mov     ebx, [ebp+arg_8]
.text:1000E5B8                 lea     eax, [esi+108h]
.text:1000E5BE                 push    eax             
.text:1000E5BF                 mov     dword ptr [esi], offset off_10073520
.text:1000E5C5                 call    instantiate_object
This totally makes sense! There is a second call to the "instantiate_object" function and this time its parameters are p_key_bytes and esi+108h. It makes us think that this function creates an object with the bytes of the key from p_key_bytes and puts its address in esi+108h.
Ok, here we go... Again! Recursive way to think: let's call "do_rc4_2" the function we are in and follow p_key_bytes via Cross Reference to see when it is filled.
.text:1000129A                 lea     ecx, [ebp+58h]
.text:1000129D                 call    get_key_object
.text:100012A2                 push    eax         
.text:100012A3                 lea     eax, [ebp-1F4h]
.text:100012A9                 push    eax            
.text:100012AA                 call    do_rc4_2

"p_key_bytes" is the second parameter of "do_rc4_2" and to investigate its value we have to follow eax, that is... the return value of the "get_key_object" function we have already described. It reads an object from the address contained in ecx... that is... the one contained in ebp+58h! Really, really weird!
Why ebp+58h? Are there so many parameters on the stack?

In order to understand the situation properly, we have to go at the beginning of the function "do_rc4_2":
.text:10001230                 push    ebp
.text:10001231                 sub     esp, 48h
.text:10001234                 mov     eax, offset sub_1006A3CF
.text:10001239                 call    __EH_prolog
To skip some boring calculations, let's just say that "__EG_prolog" sets the value of ebp to esp-4. So, after the execution of these instructions, the stack will look like this:
... [prolog][48h bytes][ebp][ret_addr][param_1][param_2] ...
prolog + 48h + ebp + ret_addr + param_1 = 4h + 48h + 4h +4h +4h = 58h
It sounds good! It means that the code points to param_2.
Once again... we call "go" the function we are in, and look for the "go" second parameter via Cross Reference.
.text:10003254                 sub     esp, 14h
.text:10003257                 mov     eax, esp
.text:10003259                 mov     [ebp+78h], esp
.text:1000325C                 push    eax
.text:1000325D                 mov     ebx, [ebp+68h]
.text:10003260                 call    do_newcopy_addref
.text:10003265                 mov     byte ptr [ebp-4], 2
.text:10003269                 push    dword_10091C08
.text:1000326F                 mov     byte ptr [ebp-4], 1
.text:10003273                 call    go              
And here comes the problem... we are looking for the second parameter, but there's only one push! Don't panic.
Let's give a look at the code: first it allocates memory on the stack, using the sub esp, 14h instruction, and then it calls the "do_newcopy_addref" function that copies something from the value at the address in ebp+68h to esp-14h (once again, ebp+68h is passed via register!).
So, we have to re-figure out what the stack looks like:
... [prolog][48h bytes][ebp][ret_addr][param_1][14h bytes object] ...
Basically, param_2 is a 14h bytes object.
This is unusual, as normally the code would have passed a pointer to the object instead of the object itself. This also makes the code more difficult to analyze because, in this way, IDA cannot recognize the parameter anymore.
We are almost done: let's focus on ebp+68h and try to track it back!
.text:1000323E                 push    dword ptr [ebp+78h]
.text:10003241                 lea     eax, [ebp+68h]  
.text:10003244                 push    eax
.text:10003245                 call    sub_1000346A   
The reasoning is always the same: we see a function with two parameters, one of which is ebp+68h; so, we can suppose that the other one, that is ebp+78h, points to the bytes of the key and the function instantiates an object by making a copy from the key itself.
Now, we have to follow ebp+78h. It reminds us of the weird parameter ebp+58h we saw before... So, again, we go at the beginning of the function and notice:
.text:100031FA                 push    ebp
.text:100031FB                 sub     esp, 6Ch
.text:100031FE                 mov     eax, offset loc_1006ACBC
.text:10003203                 call    __EH_prolog

This time the stack will look like this:
... [prolog][6Ch bytes][ebp][ret_addr][param_1][param_2] ...
prolog + 6Ch + ebp + retaddr = 4h + 6Ch + 4h + 4h = 78h
So, ebp+78h points to param_1.
Again, we go via Cross Reference to follow param_1 and see:
.text:100126FB                 push    [ebp+arg_0]
.text:100126FE                 call    sub_100031FA
arg0 is our target! Another first parameter to follow, another Cross Reference to see:
.text:100033BA                 push    [ebp+arg_0]
.text:100033BD                 call    sub_100126D5
But now we are in a very special function:
.text:100033A4 UpdateTBSList   proc near

It is an export function, but even knowing that, it doesn't make us retrieve the key as it is not called from within the executable module itself...!
Here is a visual representation of the whole analysis we have done:

I hope this discussion has given you an idea of how much such a kind of structured code can make things complicated... although we went very deeply in the code to track the key back, even at the end of our analysis, we didn't find its value!
Are we close to it? Mmm... close enough at least :P
I'm not going to describe every single detail, but let's just think of the next logical step.
You may think about looking for the call to "UpdateTBSList" in the other components of Flame, but you won't find anything because the strings are encrypted! So, first you have to decrypt the strings and then you can look in every component of the malware to find where the export is called :)
But, even knowing that... once you have finally retrieved the key... what is it useful for? Was this time-consuming effort worth it?
Well, it definitely is but, to understand why, you should conduct further investigation... :) This "never ending task" makes us think of the direction malware analysis is taking in these years: lot of effort, lot of patience, lot of dedication is required to perform even a small analysis like that!