The value that is retrieved from the memory is written to the register file during the writeback phase of the pipeline. Writing to the register file occurs in the first half of the clock cycle, while reading from the register file occurs in the second half of the clock cycle.
Thus, the value written to the register file can be read in the same measure as in the register file. So forwarding is not efficient here .
As for the number of kiosks required, you need to insert two bubbles into the pipeline, since the lw command should be at the recording stage, when the beq command is at the decoding stage.
Hope this answers your question.
source share