Assembler, like any other "compiler", is best written in the form of a lexical analyzer that feeds the grammar of the language into the processor.
Assembly language is usually simpler than regular compiled languages, since you don’t have to worry about line-crossing lines, and the format is usually fixed.
I wrote assembler for a (fictional) processor about two years ago for educational purposes and basically looked at each line as:
- optional label (e.g.
:loop). - (e.g.
mov). - (e.g.
ax,$1).
The easiest way to do this is to make sure tokens are easily distinguishable.
That's why I made the rule that labels should start with :- this made it easier to parse the string. String Processing Process:
, , . , . Intel AT & T, .
, , , (, doJmp, doCall, doRet), , .
, doCall , doRet .
, encInstr:
private static MultiRet encInstr(
boolean ignoreVars,
String opcode,
String operands)
{
if (opcode.length() == 0) return hlprNone(ignoreVars);
if (opcode.equals("defb")) return hlprByte(ignoreVars,operands);
if (opcode.equals("defbr")) return hlprByteR(ignoreVars,operands);
if (opcode.equals("defs")) return hlprString(ignoreVars,operands);
if (opcode.equals("defw")) return hlprWord(ignoreVars,operands);
if (opcode.equals("defwr")) return hlprWordR(ignoreVars,operands);
if (opcode.equals("equ")) return hlprNone(ignoreVars);
if (opcode.equals("org")) return hlprNone(ignoreVars);
if (opcode.equals("adc")) return hlprTwoReg(ignoreVars,0x0a,operands);
if (opcode.equals("add")) return hlprTwoReg(ignoreVars,0x09,operands);
if (opcode.equals("and")) return hlprTwoReg(ignoreVars,0x0d,operands);
hlpr... , . , , adc , add and `, ( , ).
, , , . .
, ( ), , , , .
CPU, , .
, . . , . ignoreVars ( , , 0).