How to redo the C library?

Suppose I have a dynamic library (.so) on Linux. I also have an existing application that uses the library. The library is deprived. I would like to create some (approximate) header file for the library so that I can write another program that uses it.

Simply use objdump to see what functions are in the library, and ltrace to see each call when it is executed.

How do you know what function arguments are?

Some ideas: I can probably use LD_PRELOAD or a trick like dlsym to load a pad library that looks at the stack whenever any function is called in the source library. Perhaps I can also do something in a pad that resets the registers (this is on ARM, so it will be r0-r3, I suppose). With a lot of work (looking at the disassembly), it may also be possible to find out if the register contains a pointer to be dereferenced, and then dump the laying function located at that pointer.

It seems like a big step from there "this function takes as its first argument a pointer to a structure with the following fields ..." Are there any automated tools for this kind of thing?

Note. I am not at all interested in how functions work, how to present them correctly.

+6
source share
1 answer

A good start is a disassembler such as Objdump, HopperApp, or IDA Pro. The last of them automatically determines the parameters for non-specific cases.

If you want to understand for yourself how this works, I would consider various “calling conventions” (wikipedia is a good start).

Example for __stdcall: let's say you have the x86.so library, and something like this happens in binary format:

push 3 push 2 push 1 call func ; void func(int a, int b, int c) where a=1, b=2 and c=3 

Arguments are pushed onto the stack in reverse order. EAX, ECX and EDX can be used inside the function (stored by the calling party), other registers must be explicitly saved from the function itself (stored by the called party). This says nothing about the data type of the argument. This requires some more reversal, which must be solved.

Even IDA Pro does not automatically detect every information, because it depends on many factors and can be very complex :)

+3
source

All Articles