Intelligent solution for calculating transition addresses in a bytecode compiler?

Say I implement a bytecode compiler similar to Lua's / Python ... and so on.

I look at the AST, generating bytecode instructions, and I run break inside the if-else inside the while :

 while (cond1) if (cond2) ... else break 

(I tried to write the equivalent bytecode, but it did not look too useful.)

The fact is that in this example there are at least 4 jump commands, and I want to find an elegant solution to fill in the jump addresses when I collect the AST ... I don’t know the jump address for the while or break while until not fully “compiled” internal statements.

  • Pseudo-code solution would be perfect
  • The decision should not depend on whether I implement a case-by-code or stack-based bytecode compiler (I play with both)

I do not read the book of dragons yet.

If I compile AST recursively when I reach the break statement inside any arbitrary number of if-else loops and blocks, how should the compiler know which empty label to go to? I guess some kind of shortcut stack is external to the recursive AST hosting function.

+7
source share
1 answer

The principle you need is called "backpatching": fill in a dummy value for a direct jump, generate a code for the operator’s body, and then go back and replace the dummy value with the real one at the end.

eg.

 # start of codegen for 'while' L1: [evaluate cond1] jne L2 # forward reference: use a dummy value for now # start of codegen for 'if ... else' L3: [evaluate cond2] jne L4 # another forward reference: use a dummy value for now [code in 'if' branch] j L5 # another forward reference: use a dummy value for now L4: [code in 'else' branch before 'break'] j L2 [code in 'else' branch after 'break'] L5: # end of 'if ... else' # now go back and fill in references to L4, L5 inside the 'if ... else' block # end of codegen for 'if ... else' # codegen for 'while' continues... j L1 # loop L2: # end of 'while' loop # now go back and fill in references to L2 inside the 'while' block # end of codegen for 'while' 

Regarding your editing:

If I compile AST recursively when I reach the break statement within a number of loops and if-else blocks, how should the compiler know which empty label to go to? I suppose some kind of label with an inscription external to the recursive AST walk function.

The goal of a jump implementing the break statement is the end of the innermost closed loop; yes, you can track this with an explicit external stack.

But, if you have a recursive AST hosting function, you already have an implicit stack of call frames for recursive function calls, so you probably don't need an explicit one either.

eg. in

 ... while (outer) ... if (outer_done) break ... while (inner) ... if (inner_done) break ... [break from inner 'while' ends up here] ... if (outer_done_2) break ... [break from outer 'while' ends up here] ... 

all code generation for the inner while occurs in the recursive motion of the AST for the body of the outer while . Inside this recursive call, you only need to take care of the inner loop, not the outer loop.

So all you have to do is:

  • keep current backpatching state at the beginning of processing while
  • run a new empty list to track any break that appears in the body of this while
  • processes the body (which may result in a recursive call)
  • apply backlinks
  • then restore the previous state before exiting.

eg. something like that:

 codegen for while: save previous break backpatch list initialise break backpatch list as empty perform codegen for evaluating condition perform codegen for body statements apply backpatches restore previous break backpatch list 

The current break backpatch list should be part of some state that is passed to all codegen functions (you need to add code for break for it). But the saved list can simply be tracked as a local variable of the while codegen function.

+6
source

All Articles