I started implementing 8086/8088 with the goal of being accurate. I can understand the arguments for the number of clock cycles for most instructions, but I must say that I am very puzzled by the time it takes to calculate the effective address (EA).
In particular, why does calculating BP + DI or BX + SI take 7 cycles, but calculating BP + SI or BX + DI takes 8 cycles?
I could just wait for a certain number of loops, but I’m really interested to know why this 1-loop difference exists (and in general, why so many loops are required to calculate EA, since EA uses ALU for the address of computations, and ADD between registers is only 3 cycle).
source share