How can I use bytecode to optimize the runtime of dynamic languages?

Question

How can I use bytecode to optimize the runtime of dynamic languages?

I'm interested in some optimization methods or general bytecode schemes that can help speed up execution using VMs compared to interpreting AST.

+4

optimization bytecode vm-implementation

Gabriel Ščerbák Aug 27 '10 at 23:10

source share

2 answers

Perhaps you could look at the various methods that the llvm "opt" tool provides. This is a bytecode-bytecode optimization, and the tool itself will provide an analysis of the benefits of applying a particular optimization.

+1

Gian Aug 27 '10 at 23:16

source share

olliej · Accepted Answer · 2010-08-27T23:29:12+0000

The main gain in interpreting AST compared to bytecode is operational mailing; for highly optimized interpreters, this becomes a real problem. “Submission” is a term used to describe the overhead required to start an operation (such as arithmetic, access to properties, etc.).

A fairly basic AST-based interpreter would look something like this:

class ASTNode { virtual double execute() = 0; } class NumberNode { virtual double execute() { return m_value; } double m_value; } class AddNode { virtual double execute() { return left->execute() + right->execute(); } }

Thus, to execute code as simple as 1+1 , 3 virtual calls are required. Virtual calls are very expensive (in the grand scheme of things) because of the multiple directions for making a call and the total cost of making a call in the first place.

In the bytecode interpreter, you have a different sending model - instead of virtual calls, you have a run loop, akin to:

 while (1) { switch (op.type) { case op_add: // Efficient interpreters use "registers" rather than // a stack these days, but the example code would be more // complicated push(pop() + pop()); continue; case op_end: return pop(); } }

This still has a rather expensive shipping cost and native code, but much faster than virtual sending. You can further improve perf by using the gcc extension called "computed goto", which allows you to remove the dispatch switch, reducing the total cost of sending to a single indirect branch.

In addition to improving sending costs, bytecode-based bytecodes have a number of additional advantages over AST interpreters, mainly due to the ability of the "bytecode" to "directly" go to other places, since the real machine, for example, presented a fragment code like this:

 while (1) { ...statements... if (a) break; else continue; }

To implement this correctly every time the statement is executed, you need to specify whether the execution should remain in the loop or stop, so the execution loop becomes something like this:

 while (condition->execute() == true) { for (i = 0; i < statements->length(); i++) { result = statements[i]->execute(); if (result.type == BREAK) break; if (result.type == CONTINUE) i = 0; } }

As you add more forms of flow control, this alarm becomes more and more expensive. When you add exceptions (for example, flow control, which can happen everywhere), you begin to need to check these things in the middle of even basic arithmetic, which leads to higher costs. If you want to see it in the real world, I recommend that you look at the ECMAScript specification, where they describe the execution model in terms of the AST interpreter.

In the bytecode interpreter, these problems mostly go away, since the bytecode can directly express the control flow, and not indirectly through signaling, for example. continue simply translates into a jump instruction, and you only get that value if it actually hits.

Finally, the AST interpreter is by definition recursive and therefore must be prevented from overflowing the system stack, which puts very serious restrictions on how much you can count in your code, for example:

 1+(1+(1+(1+(1+(1+(1+(1+1)))))))

The interpreter has 8 levels of recursion (at least) - this can be a very significant cost; older versions of Safari (pre-SquirrelFish) used the AST interpreter, and for this reason JS was only allowed in a few hundred recursion levels versus 1000 in modern browsers.

How can I use bytecode to optimize the runtime of dynamic languages?

More articles: