I need to restore the source code from an executable

This is the middle of the night, and I accidentally rewrote all my work by typing

gcc source.c -o source.c 

I still have the original binary, and my only hope is to hide it, but I don’t know how and which best tool to use to get the most readable result. I know this is probably not a good place to post, but I emphasize. Can someone help me please?

+4
source share
6 answers

Thanks for downloading the file. As I suspected, it was disabled, so the function names remained. In addition to the standard template code, I could identify the functions main , register_broker , connect_exchange (unused and empty) and handle_requests .

I spent a little time in IDA Pro, and it was not so difficult to restore the main() function. Firstly, here is the original, unmodified main() list from the IDA: http://pastebin.com/sBxhRJMM

To continue, you need to familiarize yourself with the AMD64 calling convention . So, the first four arguments are passed to RDI (EDI), RSI (ESI), RDX (EDX) and RCX (ECX). The rest is pushed onto the stack, but all calls to main() use only up to four arguments, so we don’t have to worry about that.

Good-tagged IDAs argue for standard C functions and even rename some local variables. However, it can be improved and commented on. For example, since we are in main() , we know that argc (the first argument) comes from EDI (since it int is 32-bit, it uses only the low half of RDI) and argv comes from RSI (this is a pointer, so it uses all 8 bytes of the register). Thus, we can rename local variables into which EDI and RSI are copied:

 mov [rbp+argc], edi mov [rbp+argv], rsi 

Next is a simple conditional block:

 cmp [rbp+argc], 2 jz short loc_400EB3 mov rax, cs: stderr@ @GLIBC_2_2_5 mov rdx, rax mov eax, offset aUsage ; "Usage" mov rcx, rdx ; s mov edx, 5 ; n mov esi, 1 ; size mov rdi, rax ; ptr call _fwrite mov edi, 1 ; status call _exit 

Here we compare argc with 2, and if it is equal, we move on to the code. If it is not equal, call fwrite() . The first argument for it is in rdi , and rdi loaded from rax , which contains the address of the permanent string "Usage". The second argument is in esi and equal to 1, the third in edx and equal to 5, the fourth in rcx , which is loaded from rdx , which has the value stderr@ @GLIBC_2_2_5 , which is basically a fancy link to the stderr variable from libc. Twisting it all together, we get:

 fwrite("Usage", 1, 5, stderr); 

From my experience, I can say that, most likely, this is the built-in fprintf , since 5 is exactly the length of the string. That is, the source code was probably:

 fprintf(stderr, "Usage"); 

The next call is a simple exit(1); . Combining both comparison methods, we get:

 if ( argc != 2 ) { fprintf(stderr, "Usage"); exit(1); } 

Continuing in this vein, we can identify other calls and variables that they use. It was somewhat tedious to describe all of this, so I downloaded a commented-out version of the disassembly, where I tried to show the equivalent C code for each call. You can see it here: http://pastebin.com/p5sRSwgQ

From this commented version, it’s not very difficult to imagine a possible version of main() :

 int main(int argc, char **argv) { if ( argc != 2 ) { fprintf(stderr, "Usage"); exit(1); } char name[256]; gethostname(name, sizeof(name)); struct hostent* _hostent = gethostbyname(name); struct in_addr *_addr0 = (struct in_addr *)(_hostent->h_addr_list[0]); struct sockaddr_in addr; addr.sin_family = AF_INET; addr.sin_port = htons(0); addr.sin_addr.s_addr = _addr0->s_addr; char *tmp = (char *)malloc(6); sprintf(tmp, "%d", addr.sin_port); char *ip_str = inet_ntoa(*_addr0); char *newbuf = (char *)malloc(strlen(argv[1]) + strlen(ip_str) + strlen(tmp) + 5); strcpy(newbuf, "r"); strcat(newbuf, " "); strcat(newbuf, argv[1]); strcat(newbuf, " "); strcat(newbuf, ip_str); strcat(newbuf, " "); strcat(newbuf, tmp); register_broker(newbuf); int fd = socket(PF_INET, SOCK_STREAM, 0); if ( fd < 0 ) { perror("Error creating socket"); exit(1); } if ( bind(fd, (struct sockaddr*)&addr, sizeof(addr)) != 0 ) { perror("Error binding socket"); exit(1); } if ( listen(fd, 0x80) != 0 ) { perror("Error listening on socket"); exit(1); } handle_requests(fd); } 

Restoring two other functions remains an exercise for the reader :)

+4
source

There are several tools (you can search on Google), but I would suggest recoding it. The time you invest in refactoring, what the demarcher will return is probably higher than re-encoding.

I know this seems obvious, but the correct answer would be: restore from backup (what you need)

+4
source

Unfortunately, there is no good way to go from binary to source. You can try Boomerang , but I really do not expect good results.

+3
source

First, find the source backup file. Most editors create files with the name .bak or filename.c~ each time the file is saved. On a Windows computer, the forensic software tool can retrieve the latest source file. The tool I wrote, getfile previously offered by NTI, but was acquired by Armor Holdings several years ago. I don’t know if this is available.

If the code is running, often running it under the strace() utility (a standard component of Linux distributions) can help in some aspects of decoding a program, especially if it is focused on i / o. Alas, if the program mainly consists in processing internal data, this is not very useful. strace() creates a log of system calls and parameters passed by the program; it is sometimes an invaluable tool for understanding program behavior. for example, strace date creates (in part, I skipped running the runtime library):

 clock_gettime(CLOCK_REALTIME, {1315760058, 681379835}) = 0 open("/etc/localtime", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 fstat64(3, {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb78b5000 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0"..., 4096) = 2819 _llseek(3, -24, [2795], SEEK_CUR) = 0 read(3, "\nPST8PDT,M3.2.0,M11.1.0\n", 4096) = 24 _llseek(3, 2818, [2818], SEEK_SET) = 0 close(3) = 0 munmap(0xb78b5000, 4096) = 0 fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb78b5000 write(1, "Sun Sep 11 09:54:18 PDT 2011\n", 29Sun Sep 11 09:54:18 PDT 2011) = 29 close(1) = 0 munmap(0xb78b5000, 4096) = 0 close(2) = 0 

Once you have something worth saving:

  • Add some source control (git, svn, cvs, ...), possibly more than one
  • Use an auto-build tool like make to avoid stupid mistakes
  • Back up from time to time. Even when I’m in a client with stone knives and bears, I can still send the source files by email for the last queue backup mechanism.
+2
source

You can use dcc . But next time you should use Git;)

+1
source

You can try disassembling with objdump -d <filename> .

You can also look at symbol names with the nm utility to drag and drop your memory and help transcode the source.

The commercial IDA Pro disassembler / debugger is popular in software reverse engineering. Unfortunately, reverse engineering a binary file is slow and difficult.

0
source

All Articles