What is a microcoded instruction?

Some of the answers I saw write to questions about SO, they talk about microcoded instructions. I wondered what it is.

Can someone explain what it is and why they are there?

+8
assembly cpu cpu-architecture
source share
1 answer

The CPU reads the machine code and decodes it into internal control signals that send the correct data to the desired execution units.

Most instructions are mapped to one internal operation and can be decoded directly. (for example, on x86, add eax, edx just sends eax and edx to the integer ALU for the ADD operation and puts the result in eax.)

Some other individual instructions do a lot of work. for example, x86 rep movs implements memcpy(edi, esi, ecx) and requires the processor to loop.

When command decoders see such an instruction, instead of just generating internal control signals, they read the microcode from the microcode ROM.

A microcoded instruction is a procedure that decodes many internal operations.


Modern x86 processors always decode x86 instructions for internal micro-operations. In this terminology, it is still not considered “microcoded”, even when add [mem], eax decodes the load from [mem] , the ALD ADD operation and back to [mem] . Another example is xchg eax, edx , which decodes up to 3 hours in Intel Haswell . And, interestingly, it’s not exactly the same thing that you could use from 3 MOV instructions to exchange with a zero register, because they do not have a zero delay.

On Intel / AMD processors, “microcoding” means that decoders include a microcode sequencer to feed uops from ROM to the pipeline, rather than creating multiple uops directly.

In modern Intel processors, the limit on what decoders can produce directly, without switching to ROM microcode, is 4 uops (fused-domain). AMD similarly has a single or double FastPath instruction, and outside of it, VectorPath or Microcode, as David Kanter explained, take a closer look at AMD Bulldozer , in particular, talking about its decoders.

Another example is the x86 integer DIV instruction, which is microcoded even on modern processors such as Intel Haswell. See My answer to Why is this C ++ code faster than my handwritten assembly for testing the Collatz hypothesis? for numbers.

FP division is also slow, but decoded by one uop, so it is not a front end bottleneck. If FP separation is rare and not a bottleneck, it can be as cheap as multiplication. (But if execution should wait for its result or bottlenecks with its bandwidth, it is much slower.)

Integer division and other microcoded instructions can give the CPU a hard time, but creates effects that make the meaning of code alignment where it would not be otherwise.


To learn more about the internal components of x86, see x86 , and especially the Agner Fog Microarchive Guide .


In some older / simpler CPUs, each instruction was effectively microcoded. For example, 6502 executed instructions 6502 by executing a series of internal instructions from ROM to decode the PLA . This works well for a non-pipelined processor, where the order of use of different parts of the processor can vary from command to instruction.


Historically, “microcode” had a different technical meaning, meaning something like internal control signals decoded from a command word. Especially in a processor such as MIPS, where the command word is mapped directly to these control signals, without complex decoding. (Perhaps this is partially wrong, I read something like this (except in the remote answer to this question), but could not find it later.)

+14
source share

All Articles