Sunday, March 10, 2013

Binary Instrumentation for Exploit Analysis Purposes (part 1)


This article is about binary instrumentation over various exploit scenarios. In particular, we are going to use Pin, a software developed by Intel, to show how this approach can help with the analysis.

Pin is employed to create dynamic program analysis tools, the so called "Pintools". Once executed, a Pintool acts almost like a virtual machine that runs the code from a target executable image and rebuilds it by adding the code you need to perform your own analysis. For example, you can: install a callback that is invoked every time a single instruction is executed; inspect registers; alter the context and so on.

Note: I've tested the whole work using Windows XP 32 bit and Visual Studio 2010.

How to compile and execute a Pintool.

The simplest way to compile a Pintool is to use the Visual Studio project provided by Intel, located in the Pin folder at: \source\tools\MyPinTool .

To run it, simply type: pin -t <your_pintool.dll> -- <application_path>.
In this way your Pintool will be executed within the application you want to test.

How to code a Pintool: a (very) short description.

A Pintool begins with a standard initialization of the Pin engine by using the "PIN_Init()" function; then, you need to register the callbacks for the events you want to handle. 
For instance, you can use:
  • "INS_AddInstrumentFunction()" to register a callback that is invoked at every executed instruction;
  • "IMG_AddInstrumentFunction()" to register a callback that notifies you every time an executable module is loaded;
  • "PIN_AddThreadStartFunction()" and "PIN_AddThreadFiniFunction()" to handle thread creation and ending.

In particular, if you register a callback with "INS_AddInstrumentFunction()", you can then use the "INS_InsertCall()" function from it and register other callbacks.
These callbacks have a special property: they can be invoked before or after an instruction is executed. Also, you can pass to them any kind of data, including the value of specific registers (the instruction pointer, for instance), memory addresses and so on.

Finally, you'll have to use "PIN_AddFiniFunction()" to register the callback that is invoked when the application quits.

Once all the callbacks are registered, you can start the instrumented program by calling "PIN_StartProgram()".

Your Pintool can filter specific conditions with an incredibly accurate resolution, but bear in mind that the performances may degrade badly depending on what kind of actions you choose to do.

As an example, let's consider again the "INS_AddInstrumentFunction()", and suppose that we are going to register a callback that logs every executed instruction to a file: if you are distracted, you might generate a file I/O for every single instruction, which is very inefficient. Another operation that will reduce your Pintool's performances, if called frequently, is the disassembler functionality.
So be careful: your instrumented application can run almost at realtime speed if your Pintool is well written, but a bad implementation may slow down your application up to the point where it will take minutes to run.

A basic Pintool.

Here is a very basic Pintool to which we will add more specific functions later.

 #include <stdio.h>  
 #include "pin.H"   
 namespace WINDOWS  
     #include <windows.h>  
 FILE * OutTrace;  
 ADDRINT ExceptionDispatcher = 0;
 /* ===================================================================== */  
 /* Instrumentation functions                                             */  
 /* ===================================================================== */  
 VOID DetectEip(ADDRINT AddrEip, ...)   
     if(AddrEip == ExceptionDispatcher)  
         fprintf(OutTrace, "%08x Exception occurred!\n", AddrEip);   

     // Here you can call the functions that we will add
     //(you should also remove the next line to avoid tracing every instruction being executed)
     fprintf(OutTrace, "%08x \n", AddrEip);  
 // Pin calls this function every time a new instruction is encountered  
 VOID Instruction(INS Ins, VOID *v)  
     // Insert a call to DetectEip before every instruction, and pass it the IP  
 VOID ImageLoad(IMG Img, VOID *v)  
     fprintf(OutTrace, "Loading module %s \n", IMG_Name(Img).c_str());  
     fprintf(OutTrace, "Module Base: %08x \n", IMG_LowAddress(Img));  
     fprintf(OutTrace, "Module end: %08x \n", IMG_HighAddress(Img));  
 /* ===================================================================== */  
 /* Finalization function                                                 */  
 /* ===================================================================== */  
 // This function is called when the application exits  
 VOID Fini(INT32 code, VOID *v)  
     fprintf(OutTrace, "Terminating execution\n");  
 /* ===================================================================== */  
 /* Print Help Message                                                    */  
 /* ===================================================================== */  
 INT32 Usage()  
     PIN_ERROR("Init error\n");  
     return -1;  
 /* ===================================================================== */  
 /* Main                                                                  */  
 /* ===================================================================== */  
 int main(int argc, char * argv[])  
     OutTrace = fopen("itrace.txt", "wb");  
     WINDOWS::HMODULE hNtdll;  
     hNtdll = WINDOWS::LoadLibrary("ntdll");  
     ExceptionDispatcher = (ADDRINT)WINDOWS::GetProcAddress(hNtdll, "KiUserExceptionDispatcher");  
     fprintf(OutTrace, "Exception handler address: %08x \n", ExceptionDispatcher);  
     // Initialize pin  
     if (PIN_Init(argc, argv))   
     // Register Instruction to be called to instrument instructions  
     INS_AddInstrumentFunction(Instruction, 0);  
     // Register ImageLoad to be called at every module load  
     IMG_AddInstrumentFunction(ImageLoad, 0);  
     // Register Fini to be called when the application exits  
     PIN_AddFiniFunction(Fini, 0);  
     // Start the program, never returns  
     fprintf(OutTrace, "Starting Pintool\n");   
     return 0;  

It basically logs to a file: the address of each instruction being executed; all the exceptions occurred; the name of each module being loaded, including the base and the end address.

I have also put a comment in the "DetectEip()" function, to specify where you can call the functions we will add later.

First exploit scenario: stack overflow.

As a first case study, we are going to consider a specially crafted sample:

 #include <stdio.h>  
 #include <string.h>  
 unsigned char Var[2] = {0xFF, 0xE4};  
 void GetPassword(){  
  char Password[12];  
  memset(Password, 0, sizeof(Password));  
  printf("Insert your password (max 12 chars):\n");  
  int i = -1;  
    Password[i] = getchar();  
  } while (Password[i] != 0x0D && Password[i] != 0x0A);  
  Password[i] = 0;  
  printf("Your password is: %s \n", Password);  
 void main(void){  

Before compiling and linking it (I used Visual Studio 10), be sure to disable all the security options (stack canaries, DEP, ASLR) and to set the Base Address to 0x41410000.
I know it might sound a little unreal, and in fact... it is! But don't worry, as I said before, this is just the simplest example that crossed my mind and we are going to use it as a first test. Anyway the methodology I'm proposing is very effective and we will see a real case study later.

First, we need to "exploit" this little test: I'll be quick. We can open the executable with Ollydbg and debug it until we find the "getchar" function, that grabs an input string. Then, we enter the following (in my case at least, you should check the parameters explained later if you want to be 100% sure!): "123456789abcAAAA 0AABBBBBBBBBBBBBBBBBBB" (remove the " ").

What's the meaning of it? We are going to fill all the 12 required bytes, and because of the lack of control over the size of the input, we also type:

  • "AAAA", that is the padding added by the compiler;
  • " 0AA", that corresponds to the 0x41413020 address (= "AA0 ", because of the endianness) where the "JMP ESP" instruction (= "0xFF 0xE4" as an opcode) is located --- this will overwrite the return address of the "main" function;
  • a bunch of "B", that corresponds to the "INC EDX" instruction --- this is where you will usually put the shellcode, but as a test every valid instruction will be fine!

Now that you have tested that the string I provided works also in your case, or you have built your own valid string, we are ready to analyze our first exploit scenario: a simple stack overflow. How can we detect that?
The most natural idea is to perform a check over EIP to see whether its value corresponds to a non-executable area (the stack in this case).

The Pintool maintains two variables containing the base and end address of the module being executed.
If the value of the EIP isn't in the range specified by these two addresses, Pintool accesses the modules list maintained by Pin, looking for a new executable module in which the value of EIP resides (for instance, after an API call). When such a module is found, the variables containing the base and end address are updated (making it the current module).
If the value of EIP isn't located within any of the modules, the Pintool reports it as suspicious and logs the list of the last 1000 executed values of EIP.

Here is the code to do that:

 #define LAST_EXECUTED 1000  
 UINT32 LastExecutedPos;
 ADDRINT CurrentModuleBase, CurrentModuleEnd;  
 bool IsModuleFound(ADDRINT Addr)  
     for(IMG Img = APP_ImgHead(); IMG_Valid(Img); Img = IMG_Next(Img))  
         if(Addr >= IMG_LowAddress(Img) &&  
             Addr <= IMG_HighAddress(Img))    // <=, not <  
             CurrentModuleBase = IMG_LowAddress(Img);  
             CurrentModuleEnd = IMG_HighAddress(Img);  
             return true;  
     return false;  
 void CheckEipModule(ADDRINT AddrEip)  
     int i;  
     if(! (AddrEip >= CurrentModuleBase && AddrEip < CurrentModuleEnd) )  
             // eip is no within an executable image!  
             fprintf(OutTrace, "EIP detected not within an executable module: %08x \n", AddrEip);  
             fprintf(OutTrace,"Dumping list of previously executed EIPs \n");  
             for(i = LastExecutedPos; i < LAST_EXECUTED; i++)  
                 fprintf(OutTrace, "%08x \n", LastExecutedBuf[i]);   
             for(i = 0; i < LastExecutedPos; i++)  
                 fprintf(OutTrace, "%08x \n", LastExecutedBuf[i]);   
             fprintf(OutTrace, "%08x \n --- END ---", AddrEip);   
     LastExecutedBuf[LastExecutedPos] = AddrEip;  
     if(LastExecutedPos >= LAST_EXECUTED)  
         // circular logging  
         LastExecutedPos = 0;  

You can simply copy it in the provided basic Pintool, but remember to also add the line:


in the "DetectEip()" function (where specified by the comment).

Compile/link the Pintool and execute it.

Once executed, it will generate a log (I've cut some lines!) like the following:

Exception handler address: 7c91eaec 
Starting Pintool
Loading module C:\...\StackBof.exe 
Module Base: 41410000 
Module end: 41414fff 
Loading module C:\WINDOWS\system32\kernel32.dll 
Module Base: 7c800000 
Module end: 7c8fefff 
Loading module C:\WINDOWS\system32\ntdll.dll 
Module Base: 7c910000 
Module end: 7c9c5fff 
Loading module C:\WINDOWS\system32\MSVCR100.dll 
Module Base: 78aa0000 
Module end: 78b5dfff 
EIP detected not within an executable module: 0012ff84 
Dumping list of previously executed EIPs 
 --- END ---

It's very simple to understand what happened just by reading the log:

  • the RET instruction is located at the address "0x4141105A";
  • it jumps to the overwritten return address, that is the address "0x41413020", where a "JMP ESP" is located;
  • Pintool successfully detects that we are trying to execute code within a non executable module (that is the "0x0012FF84" address, belonging to the stack).


This was an introductory article on binary instrumentation for exploit analysis purposes and I really hope you liked it! See you for the second part in a few days, where I will discuss another scenario: a real pdf exploit, that makes use of ROP and Heap Spraying.


  1. Very clear explanation. Looking forward to the next part!

  2. Thank you, Carlos! I will upload it in a few days :)

  3. This comment has been removed by the author.