Assembler Design Template

Question

Assembler Design Template

I am doing assembler 8051.

Before everything is a tokenizer that reads the following tokens, sets error flags, recognizes EOF, etc.
Then there is the main compiler loop that reads the following markers and checks the correct mnemonics:

mnemonic= NextToken(); if (mnemonic.Error) { //throw some error } else if (mnemonic.Text == "ADD") { ... } else if (mnemonic.Text == "ADDC") { ... }

And this continues to several cases. Worse, it is a code within each case that checks the correct parameters and then converts it into compiled code. Now it looks like this:

 if (mnemonic.Text == "MOV") { arg1 = NextToken(); if (arg1.Error) { /* throw error */ break; } arg2 = NextToken(); if (arg2.Error) { /* throw error */ break; } if (arg1.Text == "A") { if (arg2.Text == "B") output << 0x1234; //Example compiled code else if (arg2.Text == "@B") output << 0x5678; //Example compiled code else /* throw "Invalid parameters" */ } else if (arg1.Text == "B") { if (arg2.Text == "A") output << 0x9ABC; //Example compiled code else if (arg2.Text == "@A") output << 0x0DEF; //Example compiled code else /* throw "Invalid parameters" */ } }

For each of the mnemonics, I have to check the correct parameters, and then create the correct compiled code. Very similar codes to check the correct parameters for each mnemonic repetition in each case.

So, is there a design template to improve this code?
Or just an easier way to implement this?

Edit: I accepted the skirting answer, thanks to him. However, if you have any ideas about this, I will be happy to know them. Thanks to everyone.

+8

c ++ assembly compiler-construction design-patterns compiler-design

Hossein Apr 7 '11 at 19:49

source share

5 answers

Yes. Most assemblers use a data table that describes instructions: mnemonic, op code, operand forms, etc.

I suggest looking at the source code for as . ~~I have some problems with the search.~~ Take a look here . (Thanks to Hossein.)

+3

wallyk Apr 7 '11 at 19:54

source share

I think you should look into the visitor template. This may not make your code much simpler, but it will reduce linkage and increase reuse. SableCC is a java compiler compiler framework that uses it extensively.

0

Anthony vallée-dubois Apr 7 '11 at 19:55

source share

Have you looked at the Team Manager template?

http://en.wikipedia.org/wiki/Command_pattern

The general idea would be to create an object that processes each command (command) and create a lookup table that maps each command to a handler class. Each class of commands would have a common interface (e.g. Command.Execute (* args)), which would definitely give you a cleaner / more flexible design than your current huge switch statement.

0

Brandon moretz Apr 7 '11 at 19:55

source share

When I played with the Microcode emulator tool, I converted everything to descendants of the Instruction class. From Instruction were category classes such as Arithmetic_Instruction and Branch_Instruction . I used the factory pattern to instantiate.

It is best to do so to get the assembly language syntax specification. Write a lexer to convert to tokens (**, please do not use if-elseif-else stairs). Then, based on the semantics, enter the code.

Once upon a time, assemblers had at least two passes: the first to define constants and generate skeletal code (including symbol tables). The second pass was to generate more specific or absolute values.

Have you read the Dragon book recently?

0

Thomas Matthews Apr 7 '11 at 20:09

source share

plinth · Accepted Answer · 2011-04-07T20:11:01+0000

I wrote several assemblers over the years, doing a manual disassembly, and to be honest, you are probably better off using a grammar language and a parser generator.

Here's why - a typical assembly line will probably look something like this:

 [label:] [instruction|directive][newline]

and the instruction will be:

 plain-mnemonic|mnemonic-withargs

and the directive will be:

 plain-directive|directive-withargs

and etc.

With a decent parser generator like Gold , you can beat the grammar in 8051 in a few hours. The advantage of this manual parsing is that you can have fairly complex expressions in your assembler, for example:

 .define kMagicNumber 0xdeadbeef CMPA #(2 * kMagicNumber + 1)

which can be a real bear made by hand.

If you want to do this manually, create a table of all your mnemonics, which will also include the various valid addressing modes that they support, and for each addressing mode, the number of bytes that each option will take, and the operation code for This. Something like that:

 enum { Implied = 1, Direct = 2, Extended = 4, Indexed = 8 // etc } AddressingMode; /* for a 4 char mnemonic, this struct will be 5 bytes. A typical small processor * has on the order of 100 instructions, making this table come in at ~500 bytes when all * is said and done. * The time to binary search that will be, worst case 8 compares on the mnemonic. * I claim that I/O will take way more time than look up. * You will also need a table and/or a routine that given a mnemonic and addressing mode * will give you the actual opcode. */ struct InstructionInfo { char Mnemonic[4]; char AddessingMode; } /* order them by mnemonic */ static InstructionInfo instrs[] = { { {'A', 'D', 'D', '\0'}, Direct|Extended|Indexed }, { {'A', 'D', 'D', 'A'}, Direct|Extended|Indexed }, { {'S', 'U', 'B', '\0'}, Direct|Extended|Indexed }, { {'S', 'U', 'B', 'A'}, Direct|Extended|Indexed } }; /* etc */ static int nInstrs = sizeof(instrs)/sizeof(InstrcutionInfo); InstructionInfo *GetInstruction(char *mnemonic) { /* binary search for mnemonic */ } int InstructionSize(AddressingMode mode) { switch (mode) { case Inplied: return 1; / * etc */ } }

Then you will have a list of each command, which in turn contains a list of all addressing modes.

So, your parser will become something like this:

 char *line = ReadLine(); int nextStart = 0; int labelLen; char *label = GetLabel(line, &labelLen, nextStart, &nextStart); // may be empty int mnemonicLen; char *mnemonic = GetMnemonic(line, &mnemonicLen, nextStart, &nextStart); // may be empty if (IsOpcode(mnemonic, mnemonicLen)) { AddressingModeInfo info = GetAddressingModeInfo(line, nextStart, &nextStart); if (IsValidInstruction(mnemonic, info)) { GenerateCode(mnemonic, info); } else throw new BadInstructionException(mnemonic, info); } else if (IsDirective()) { /* etc. */ }

Assembler Design Template

More articles: