The parallelism command level has diminishing returns. In particular, data dependencies between commands determine possible parallelism.
Consider the case of reading after writing (known as RAW in textbooks).
In the syntax where the first operand gets the result, consider this example.
10: add r1, r2, r3 20: add r1, r1, r1
The result of line 10 should be known by the time the calculation of line 10 begins. Redirecting data mitigates this problem, but ... only until the data becomes known.
Paul nathan
source share