Sliding frame but does not return to C

My programming language compiles to C, I want to implement tail recursion optimization. The question here is how to transfer control to another function without โ€œreturningโ€ from the current function.

This is quite easy if the control is passed to the same function:

void f() { __begin: do something here... goto __begin; // "call" itself } 

As you can see, there is no return value and no parameters; they are passed in a separate stack addressed to the global variable.

Another option is to use the built-in assembly:

 #ifdef __clang__ #define tail_call(func_name) asm("jmp " func_name " + 8"); #else #define tail_call(func_name) asm("jmp " func_name " + 4"); #endif void f() { __begin: do something here... tail_call(f); // "call" itself } 

This is similar to goto , but since goto transfers control to the first statement in the function, skipping the "input code" generated by the compiler, jmp is different, this argument is a pointer to the function, and you need to add 4 or 8 bytes to skip the input code.

Both of the above will work, but only if the caller and the caller use the same stack amount for local variables that are allocated by the callerโ€™s input code.

I thought of doing leave manually with the built-in assembly, then replacing the return address on the stack, and then making a legal function call, like f() . But my attempts all failed. You need to somehow change the BP and SP .

So how to implement this for x64? (Again, assuming functions have no arguments and return void ). A portable way without an integrated assembly is better, but the assembly is accepted. Maybe you can use longjump ?

Perhaps you can even push the called address on the stack by replacing the original return address and just ret ?

+8
c compiler-construction tail-recursion
source share
4 answers

Do not try to do it yourself. A good C compiler can in many cases perform tail call removal and will do so. On the contrary, hacking using the built-in assembly has good chances of making mistakes that are difficult to debug.

For example, see this snippet at godbolt.org. To duplicate it here:

The C code I use is:

 int foo(int n, int o) { if (n == 0) return o; puts("***\n"); return foo(n - 1, o + 1); } 

Compiled for:

 .LC0: .string "***\n" foo: test edi, edi je .L4 push r12 mov r12d, edi push rbp mov ebp, esi push rbx mov ebx, edi .L3: mov edi, OFFSET FLAT:.LC0 call puts sub ebx, 1 jne .L3 lea eax, [r12+rbp] pop rbx pop rbp pop r12 ret .L4: mov eax, esi ret 

Note that the tail call has been resolved. The only call is puts .

+4
source share

Since you do not need arguments and return values, how to combine all functions into one and use labels instead of function names?

 f: __begin: ... CALL(h); // a macro implementing traditional call ... if (condition_ret) RETURN; // a macro implementing traditional return ... goto g; // tail recurse to g 

The hard part here is the RETURN and CALL macros. To return, you must save another stack, the setjump buffer stack, so when you return, you call longjump (ret_stack.pop ()), and when you call, you do ret_stack.push (setjump (f)). This is a poetic performance c, you will need to fill in the details.

gcc may offer some optimization here with computed goto, they are easier than longjump. Also, people who write vms have similar problems and apparently have asm-based solutions for those even on MSVC, see Example here .

And finally, such an approach, even if it saves memory, can be confusing for the compiler, and therefore can lead to performance anomalies. Maybe you better create some kind of portable assembly language, maybe llvm? Not sure there must be something that goto figured out.

+1
source share

A worthy approach to this problem is to use trampolines. Essentially, each compiled function returns a pointer to the function (and possibly an argument counter). The top level is a closed loop, which, starting with your main , simply calls the return pointer of the ad infinitum function. You can use the longjmp function to exit the loop, i.e. To complete the program.

See this SO Q & A. Or Google "tco trampoline recursion."

For another approach, see Cheney in the MTA , where the stack only grows to full, which starts the GC. This works as soon as the program is converted to a Continued Transfer Style (CPS), as functions never return in this style; therefore, after GC, the stack is all garbage and can be reused.

+1
source share

I suggest hacking. The x86 call command, which the compiler uses to translate function calls, pushes the return address onto the stack and then jumps.

What you can do is manipulate the stack a bit using some built-in assembly and maybe some macros to get a little better from the headache. You basically have to rewrite the return address on the stack, which you can do immediately in the called function. You may have a wrapper function that overwrites the return address and calls your function - the control flow then returns to the shell, which then moves to where you pointed.

0
source share

All Articles