Writing a virtual machine - well-formed bytecode?

I write a virtual machine in C just for fun. Hmm, I know, but, fortunately, I am SO, so I hope no one dares :)

I wrote a very fast dirty virtual machine that reads the lines of (my) ASM and does everything. Right now I have only 3 commands: add , jmp , end . Everything is fine, and actually pretty cool when you can feed the lines (by doing something like write_line(&prog[1], "jmp", regA, regB, 0); and then running the program:

 while (machine.code_pointer <= BOUNDS && DONE != true) { run_line(&prog[machine.cp]); } 

I use the operation code lookup table (which may be inefficient but elegant) in C, and everything seems to be working fine.

My question is more about "best practice", but I think the answer is there. I force the virtual machine to read clean files (storing bytes in unsigned char[] ) and run the bytecode. My question is: is it VM's job to make sure that the bytecode is well-formed, or is it just a compiler task to make sure that the binary file that it spits out is well-formed?

I only ask about this because something will happen if someone edits the binary and curls (delete arbitrary parts of it, etc.). Obviously, the program will be a mistake and probably will not work. Is this even a VM problem? I’m sure that people understood the solutions to these problems much smarter than me;

+7
c assembly programming-languages vm-implementation
source share
6 answers

Is it a VM job to make sure that the bytecode is well-formed or is it just a compiler job to make sure that the binary file that it spits out is well-formed?

You can decide.

Best practice is for the virtual machine to perform a single check before execution, proportional to the size of the program, which is complex enough to ensure that nothing is executed at runtime. Then, during the actual execution of the bytecode, you run without checking. However, the idea of ​​checking before starting can require very complex analysis, and even the most high-performance virtual machines often have some checks at run time (example: array bounds).

For a hobby project, I would keep it simple and check the sanity of the VM every time you follow the instructions. The overhead for most instructions will not be too large.

+14
source share

The same problem occurs in Java, and as I recall, in this case, the virtual machine must perform some checks to make sure the bytecode is well-formed. In this situation, this is really a serious problem due to potential security problems: if someone can modify the Java bytecode file to contain something that the compiler never printed (for example, accessing the private variable from another class), it could potentially expose confidential data stored in the application’s memory, or could allow the application to access a website that it is not allowed to do, or something else. The Java virtual machine includes a bytecode verifier to make sure, as far as possible, that this kind of thing doesn't happen.

Now, in your case, if your home language does not take off and become popular, the security aspect is something you do not need to worry about; after all, who will crack your programs besides you? However, I would say that it is a good idea to make sure that your virtual machine at least has a reasonable rejection strategy when the bytecode is invalid. At a minimum, if he encounters something that he does not understand and cannot handle, he should detect this and a failure with an error message, which will facilitate debugging on your part.

+1
source share

Virtual machines that interpret bytecode typically have some way of checking their input; for example, Java will raise VerifyError if the class file is in an inconsistent state

However, it looks like you're implementing a processor, and since they tend to be lower level, there are fewer ways you can get things in a detectable invalid state - giving it an undefined operation code is one obvious way. Real processors will signal that the process has tried to execute an illegal instruction, and the OS will handle it (Linux kills it with SIGILL, for example)

+1
source share

If you are bothered by someone who edited the binary, then there will only be one answer to your question: the virtual machine should perform a check. This is the only way you can detect fraud. The compiler simply creates a binary file. He is unable to detect subsequent interference.

+1
source share

It makes sense to force the compiler to do as much health checks as possible (since this needs to be done only once), but there will always be problems that cannot be detected by static analysis, for example, [cough] stack overflow, array range errors, etc.

0
source share

I would say that for your virtual machine it is permissible for the emulated processor to catch fire until the implementation of VM itself fails. As a virtual machine developer, you can set the rules. But if you want virtual hardware companies to actually buy your virtual chip, you need to do something more forgiving of errors: good options might be to raise an exception (harder to implement) or reset the processor (much easier). Or maybe you just define each operation code as valid, except that some of them are "undocumented" - they do something unspecified, except for implementation failures. Rationale: if (!) Your VM implementation should run multiple instances of a guest at the same time, it would be very bad if one guest could make others fail.

0
source share

All Articles