No mention to software pipelining, but that is essentially our most used technique:
|
If other independent instructions are present, it may be possible to reorder these between the two dependent instructions, hiding the RAW latency under other useful instructions. In some cases, loop unrolling is coupled to this technique, to provide independent instructions for the reordering. |