What is the fastest x86 virtual machine design?

I am implementing a virtual machine in x86, and I wonder what design will give the best results. What should I focus on to dry the juice? I will implement the entire virtual machine in the x86 assembly.

I have few instructions, and I can choose their form. The instructions are executed directly in the smalltalk syntax in blocks. I outline the design of the instruction I was thinking about:

^ ... # return ^null # return nothing object # address to object ... selector: ... # message pass (in this case arity:1 selector: #selector:) var := ... # set var # get 

The kind of VM I was thinking about:

 mov eax, [esi] add esi, 2 mov ecx, eax and eax, 0xff and ecx, 0xff00 # *256 shr ecx, 5 # *8 jmp [ecx*4 + operations] align 8: operations: dd retnull dd ret # so on... retnull: # jumps here at retnul # ... retnull action ret: # ... ret action #etc. 

Don't start asking why I need another virtual machine implementation. Interpretation routines are not stocks that you simply collect when you need them. Most virtual machines that you offer elsewhere are weighed toward portability with the cost of performance. My goal is not portability, my goal is productivity.

The reason this interpreter is needed at all is because smalltalk blocks cannot be interpreted the same way:

 A := B subclass: [ def a:x [^ x*x] clmet b [...] def c [...] def d [...] ] [ 2 < x ] whileTrue: [...] (i isNeat) ifTrue: [...] ifFalse: [...] List fromBlock: [ "carrots" "apples" "oranges" toUpper ] 

I need real benefit from interpretation procedures, that is, choosing the context where the program should be read. Of course, a good compiler should simply in most cases compile obvious cases, such as: ifTrue: ifFalse or 'whileTrue:' or an example list. The need for an interpreter does not just disappear, because you can always encounter a situation where you cannot be sure that the unit will receive the treatment that you expect.

+4
source share
6 answers

I see that there is some confusion about portability, so I feel obligated to clarify some issues. These are my humble opinions, so of course you can object to them.

I assume that you have come to http://www.complang.tuwien.ac.at/forth/threading/ if you are serious about writing a virtual machine, so I will not dwell on the methods described.

It has already been mentioned that targeting a virtual machine has several advantages, such as reducing code size, reducing compiler complexity (often this leads to faster compilation), portability (note that a virtual machine point is language portability, so it doesn’t matter if it itself VM is not portable).

Given the dynamic nature of your example, your virtual machine will look more like a JIT compiler than other more popular ones. So, despite the fact that S. Lott did not miss the point in this case, his mention of the Fort is very in place. If I were to create a virtual machine for a very dynamic language, I would divide interpretation into two stages:

  • The stage of the manufacturer, which requests the AST stream on demand and converts it into a more meaningful form (for example, taking a block, deciding whether to immediately execute it or store it somewhere for later execution), possibly new types of tokens. Essentially, you are recovering context sensitive information that might be lost when parsing here.

  • The consumer stage of extracting the generated stream from 1 and performs it blindly like any other machine. If you do this sooner, you can just click on the saved stream and do with it, rather than jumping around the instruction pointer.

As you say, just imitating how the damned processor works differently doesn't fulfill any dynamism (or any other damn thing like security) that you need. Otherwise, you will write a compiler.

Of course, you can add arbitrarily complex optimizations in step 1.

+4
source

If you need something really fast, try using LLVM . It can generate native code for most processors from a high-level program description. You can either go with your own assembler language or create an llvm structure that skips the build phase, depending on what you find most convenient.

I'm not sure how much better this is for your problem, but it is definitely what I would use if I did critical execution of code that could not be compiled with the rest of the program.

+2
source

Tolerance frequency in most cases is tolerance. The fastest approach I can come up with is to generate x86 code in memory directly, as the JIT compilers do, but of course you no longer have an interpreter. You have a compiler.

However, I'm not sure that translating the interpreter in assembler will give you better performance (unless you are an assembler guru and your project is very limited in scope). Using a higher level language can help you focus on better algorithms, such as character search and register allocation strategies.

+1
source

you can speed up your submit procedure with the unencoded command for:

 mov eax, [esi] add esi, 4 add eax, pOpcodeTable jmp eax 

which should have overhead, 4 cycles for each shipment to the CPU> Pentium 4.

In addition, for performance reasons, it is better to increase ESI (IP) in each primitive routine, since the chances are high that the increment can be paired with other instructions:

 mov eax, [esi] add eax, pOpcodeTable jmp eax 

~ 1-2 cylinder overhead.

+1
source

I have to ask: why create a virtual machine with a focus on performance? Why just write x86 code directly? Nothing could be faster.

If you want a very quickly interpreted language, check out Forth . Their design is very neat and very easy to copy.

0
source

If you do not like JIT and your goal is not portability. I think you might be interested in the Google NativeClient project. They make static analyst, sandbox and others. They allow the host to execute RAW x86 commands.

0
source

All Articles