How to find the "exit" program C

The test is on 32-bit x86 Linux.

So, basically, I try to register the information of the completed base blocks by inserting the toolkit instructions into the assembly code.

My strategy is this: Write the index of the executable base unit in the globl array and clear the array from memory to disk when the array is full (16M).

Here is my problem. I need to flush an array to disk when instrumental binary code execution is complete, even if it does not reach the 16M boundary. However, I just don’t know where to find the exit from the assembly program.

I tried this:

  • grep exit from the target build program and clear the memory before the call exit statement. But, according to some debugging results, the target C program, say md5sum binary, does not call exit when execution is complete.

  • Flush the memory at the end of the main function. However, in the build code, I just don't know where exactly the exact end of the main function is. I can make a conservative approach, say, look for all ret statements, but it seems to me that not all main function ends with ret statement.

So, here is my question, how to determine the exact end of assembly code execution and insert some toolkit instructions there? Cool library code suits me. I understand that with different inputs, the binary may come out in a different position, so I think I need a conservative estimate. I am clean? thanks!

+5
source share
5 answers

I believe that you cannot do this in the general case. First, if main returns some code, this is the exit code (if main does not have an explicit return , then for the latest C-standards, the compiler needs to add an implicit return 0; ). Then the function can store the exit address in some data (for example, a global function, a field in struct , ...), and some other functions can correctly call it through the function pointer. In practice, a program can load some plugins using dlopen and use the dlsym name to "exit" or just call exit inside the plugin, etc .... AFAIU, solving this problem (finding the actual exit calls in a dynamic sense) can be proved in full generality equivalent to stopting problem. See Also Rice Theorem .

Without pretending to be an exhaustive approach, I would suggest something else (assuming that you are interested in tool programs encoded in C or C ++, etc .... whose source code is available to you). You can customize the GCC compiler with MELT to modify the base blocks processed inside GCC to call some of your tool functions. This is not trivial, but it is doable ... Of course, you will need to recompile some C code with such configured GCC in order to measure it.

(Disclaimer, I am the main author of MELT , feel free to contact me for help ...)

By the way, do you know about atexit (3) ? This may be useful for your flushing problem ... And you can also use LD_PRELOAD tricks (read about dynamic linkers , see ld-linux (8) ).

+4
source

atexit() will correctly handle 95 %% of programs. You can either change your chain of registered handlers, or measure it, like other blocks. However, some programs may exit with _exit() , which does not call atexit handlers. Probably a tool for causing data dumping and setting atexit (or on_exit() in BSD-like programs), the handler should cover almost 100% of programs.


Addendum: note that the Linux Basic Specification says that starting the C library should be:

call the initialization function (* init) ().
call main () with the appropriate arguments.
call exit () with the return value from main ().

+2
source

The method that should work every time was to create a shared memory partition to store your data there.

You also create a child process that waits for the debugging process to complete.

Once the debugging process is complete, the child process will complete write operations using the data that is in shared memory.

This should work on all forms of exit, process interruptions (for example, Ctrl + C, closing the terminal window ...) or even if the process was killed with kill.

0
source

But, according to some debugging experience, the target C program, say binary, md5sum, does not call exit when it completes execution.

Take a look at the md5sum binary on the i686 GNU / Linux system:

When disassembling ( objdump -d /usr/bin/md5sum ) we have the following:

 Disassembly of section .text: 08048f50 <.text>: 8048f50: 55 push %ebp 8048f51: 89 e5 mov %esp,%ebp 8048f53: 57 push %edi 8048f54: 56 push %esi 8048f55: 53 push %ebx 8048f56: 83 e4 f0 and $0xfffffff0,%esp 8048f59: 81 ec c0 00 00 00 sub $0xc0,%esp 8048f5f: 8b 7d 0c mov 0xc(%ebp),%edi [ ... ] 8049e8f: 68 b0 d6 04 08 push $0x804d6b0 8049e94: 68 40 d6 04 08 push $0x804d640 8049e99: 51 push %ecx 8049e9a: 56 push %esi 8049e9b: 68 50 8f 04 08 push $0x8048f50 8049ea0: e8 4b ef ff ff call 8048df0 < __libc_start_main@plt > 8049ea5: f4 hlt 

This is all the startup pattern code. The actual main program call is called inside the __libc_start_main call. If the program returns from this, then hey, look, there is an hlt instruction. This is your goal. Look at this hlt instruction and the tool at the end of the program.

0
source

You can try the following:

 int main() bool keepGoing = true; { while(keepGoing) { string x; cin >> x; if(x == "stop") { keepGoing = false; } } } 

although it is primitive ... I probably killed the encoding, but this is just a concept.

-1
source

All Articles