Effective address calculation time on 8086/8088

I started implementing 8086/8088 with the goal of being accurate. I can understand the arguments for the number of clock cycles for most instructions, but I must say that I am very puzzled by the time it takes to calculate the effective address (EA).

In particular, why does calculating BP + DI or BX + SI take 7 cycles, but calculating BP + SI or BX + DI takes 8 cycles?

I could just wait for a certain number of loops, but I’m really interested to know why this 1-loop difference exists (and in general, why so many loops are required to calculate EA, since EA uses ALU for the address of computations, and ADD between registers is only 3 cycle).

+5
source share
1 answer

Without reverse engineering the chip, I don’t think it is possible to explain the difference in cycles between [BP + SI] and [BP + DI]. (Note that this is not exactly a question of what someone did or will make reverse engineering necessary, it was done for some of the Commodore 64 chips to create more accurate emulators.) However, it’s pretty easy to explain why efficient address calculations in general take so long. The reason is that the calculation for [BX + SI] is actually DS * 16 + BX + SI, so it adds two, not just one. This is also a 20-bit calculation, and the ALU is only 16 bits, so another addition is required to calculate the upper 20 bits of the physical address. The fact that the equivalent of three registers for registration adds a cost of a total of 9 cycles and assumes that the 4-bit shift is free, so EA calculation is actually faster than equivalent instructions.

+4
source

All Articles