Questions regarding implementing a simple processor emulator

Background information:. Ultimately, I would like to write an emulator of a real machine, such as the original Nintendo or Gameboy. However, I decided that I needed to start somewhere much, much easier. My computer advisor / professor offered me specifications for a very simple imaginary processor, which he created to emulate in the first place. There is one register (battery) and 16 operation codes. Each instruction consists of 16 bits, the first 4 of which contain an operation code, the rest of which is an operand. Instructions are given as strings in binary format, for example, β€œ0101 0101 0000 1111”.

My question is: In C ++, what is the best way to parse instructions for processing? Please save my ultimate goal. Here are a few points that I reviewed:

  • I cannot just process and execute instructions as they are read, because the code is self-modifying: the instruction can change the later instruction. The only way I can get around this is to save all the changes and each instruction to check if the changes need to be changed. This can lead to a huge number of comparisons with the execution of each instruction, which is not very good. So, I think I need to recompile the instructions in a different format.

  • Although I could parse the operation code as a string and process it, there are times when an instruction as a whole should be taken as a number. For example, an increment opcode could even change a section of an op op code.

  • If I needed to convert instructions to integers, I'm not sure if I could parse only part of the operation code or operand. Even if I recompiled each instruction into three parts, the entire command as int, the operation code as int, and the operand as int, which still would not solve the problem, since I might need to increase the whole command and later analyze the affected operation code or operand. In addition, I need to write a function to perform this conversion, or is there any library for C ++ that has a function that converts a string in "binary format" to an integer (for example, Integer.parseInt (str1, 2) in Java )?

  • In addition, I would like to be able to perform operations such as bit shifting. I'm not sure how this can be achieved, but it can affect how I implement this recompilation.

Thanks for any help or advice you can offer!

+6
c ++ binary emulation machine-code
source share
4 answers

Parse the source code into an array of integers. This array is your computer memory.

Use bitwise operations to extract various fields. For example, this:

unsigned int x = 0xfeed; unsigned int opcode = (x >> 12) & 0xf; 

will extract the top four bits ( 0xf , here) from the 16-bit value stored in unsigned int . Then you can use for example. switch() to check the operation code and take the correct action:

 enum { ADD = 0 }; unsigned int execute(int *memory, unsigned int pc) { const unsigned int opcode = (memory[pc++] >> 12) & 0xf; switch(opcode) { case OP_ADD: /* Do whatever the ADD instruction definition mandates. */ return pc; default: fprintf(stderr, "** Non-implemented opcode %x found in location %x\n", opcode, pc - 1); } return pc; } 

A memory modification is just a case of writing integers to your array, possibly also using some bit math, if necessary.

+5
source share

I think the best approach is to read the instructions, convert them to unsigned integers and store them in memory, and then execute them from memory.

  • After you parse the instructions and store them in memory, self-modification is much simpler than saving a list of changes for each command. You can simply change the memory in this place (provided that you never need to know what the old instruction is).

  • Since you are converting integer instructions, this issue is controversial.

  • To analyze sections of the operation code and operands, you need to use bit offset and masking. For example, to get the op code, you mask the top 4 bits and shift down 12 bits ( instruction >> 12 ). You can also use a mask to get the operand.

  • Do you mean that your machine has instructions that shift bits? This should not affect how you store operands. When you execute one of these instructions, you can simply use the C ++ << and >> bit change operators.

+1
source share

Just in case, this helps, here is the last processor emulator that I wrote in C ++. Actually, this is the only emulator I wrote in C ++.

The specification language is a little idiosyncratic, but this is a completely respectable, simple description of a virtual machine, perhaps very similar to your prof VM:

http://www.boundvariable.org/um-spec.txt

Here is my (somewhat redesigned) code that should give you some ideas. For example, it shows how to implement math operators in a Giant Switch statement in um.cpp:

http://www.eschatonic.org/misc/um.zip

You can find other implementations to compare with web search, as the competition included a lot of people (I was not one of them: I did this much later). Although not much in C ++, I would suggest.

If I were you, I would only save the instructions as strings to begin with, if so, your virtual machine specification defines operations on them. Then convert them to integers as needed, every time you want to execute them. It will be slow, but what? Your virtual machine is not a virtual machine that you are going to use to run mission-critical programs, and the slow translator still illustrates the important points you need to know at this point.

It is possible that the virtual machine actually defines everything in terms of integers, and the lines are simply intended to describe the program when it is loaded into the machine. In this case, convert the program to integers at the beginning. If the VM stores programs and data together, with the same operations acting on both, then this is the way to go.

The way to choose between them is to view the operation code, which is used to change the program. Is the new instruction passed to it as an integer or as a string? Whatever it is, the easiest way to start is to probably store the program in this format. You can always change it after its work.

In the case of the unified messaging system described above, the machine is defined in terms of β€œplates” with space for 32 bits. Obviously, they can be represented in C ++ as 32-bit integers, so my implementation does.

0
source share

I created an emulator for a custom cryptographic processor. I used C ++ polymorphism by creating a tree of base classes:

 struct Instruction // Contains common methods & data to all instructions. { virtual void execute(void) = 0; virtual size_t get_instruction_size(void) const = 0; virtual unsigned int get_opcode(void) const = 0; virtual const std::string& get_instruction_name(void) = 0; }; class Math_Instruction : public Instruction { // Operations common to all math instructions; }; class Branch_Instruction : public Instruction { // Operations common to all branch instructions; }; class Add_Instruction : public Math_Instruction { }; 

I also had several factories. At least two will be helpful:

  • Factory to create instructions from text.
  • Factory to create instructions from the opcode

The instruction classes must have methods for loading their data from an input source (for example, std::istream ) or text ( std::string ). Investigative output methods (such as command name and opcode) must also be supported.

I had an application to create objects from the input file and put them in the Instruction vector. The executing method invokes the execute () method for each command in the array. This action leaked into the object of the instruction sheet, which performed a detailed execution.

There are other global objects that may need emulation. In my case, some of them included a data bus, registers, ALUs, and memory locations.

Please spend more time designing and thinking about the project before you encode it. I found this a rather difficult task, especially introducing a single-step debugger and a graphical interface.

Good luck

0
source share

All Articles