Code byte compared to three addresses

When developing a byte code interpreter, is there currently a consensus on whether a stack or three address formats are better (or something else?)? I consider these considerations:

  • An objective language is a dynamic language, quite similar to Javascript.

  • Performance is important, but development speed and portability are greater at the moment.

  • Therefore, the implementation will be strictly an interpreter; the JIT compiler may come later by providing resources.

  • The interpreter will be written in C.

+8
compilation bytecode interpreter language-implementation
source share
5 answers

Take a look at the OCaml bytecode interpreter, one of the fastest of its kind. This is pretty much a stack machine translated into streaming code at boot (using a computed GNU extension). You can also generate Forth-like multithreaded code, which should be relatively easy to do.

But if you keep in mind the future compilation of JIT, make sure that your stack machine is not a full-featured stack machine, but instead forms a serialization form of the expression tree (for example, .NET CLI) - so d will translate your stack bytecode "to the 3-address form, and then to the SSA.

0
source share

Read Lua Evolution and Lua Implementation 5.0 for how Lua has changed from a stack-based virtual machine to a register-based virtual machine and why it has achieved high performance.

+7
source share

The experiments of David Gregg and Roberto Jerusalski showed that register-based bytecode works better than stack-based bytecode, because fewer bytecode instructions are required to perform the same tasks (and therefore less costs decoding). Thus, the three-address format is a clear winner.

+5
source share

I have little experience (actually not) in this area, so you can check out some of them for yourself (or maybe someone can correct me where necessary?).

The two languages โ€‹โ€‹I'm currently working with are C # and Java, so I'm naturally prone to their methodologies. As most people know, both are compiled into byte code, and both platforms (CLR and JVM) use JIT (at least in major implementations). Also, I would suggest that the jitter for each platform is written in C / C ++, but I really don't know for sure.

All of these languages โ€‹โ€‹and their respective platforms are very similar to your situation (except for the dynamic part, but I'm not sure if that matters). Also, since they are such main languages, I am sure that their implementation can serve as a good guide for your design.


From this perspective, I know for sure that both the CLR and the JVM are stack based architectures. Some of the benefits that I remember for stack-based vs case-based are:

  • Smaller generated code
  • Simplified Interpreters
  • Simple compilers
  • and etc.

In addition, I believe that the stack-based will be a little more intuitive and readable, but this is a subjective thing, and, as I said, I have not seen too much byte code yet.

Some Benefits of Register-Based Architecture

  • Less instructions need to be followed.
  • Faster interpreters (follows from # 1)
  • Can be more easily translated to machine code, since most conventional hardware is based on registration
  • and etc.

Of course, there are always ways to compensate for the flaws for everyone, but I think they describe the obvious things to consider.

+1
source share

If you have JIT in your mind, then bytecodes are the only option.

Just in case, you can take a look at my TIScript: http://www.codeproject.com/KB/recipes/TIScript.aspx and sources: http://code.google.com/p/tiscript/

0
source share

All Articles