Store_Forwarding_Blocked is a warning.
The instruction for which Store_Forwarding_Blocked is issued reads data after a previous instruction wrote data to an overlapping memory space. The stall occurs if either of the following is true:
The target memory addresses of the read and write instructions are not the same.
The target memory addresses of the read and write instructions are aligned to the same address, but the data read is larger than the data written.
For more information, see the Intel(R) Pentium(R) 4 processor manuals on the web.
For the Intel(R) Pentium(R) 4 processors with Streaming SIMD Extensions 3 (SSE3), only cases b) and c) above are relevant. That is, the penalty occurs only when a write of small data element/elements is followed by a read of big data element from same address.
Read data that will be manipulated by MMX(TM) technology instructions using one of the following:
The MMX technology instruction that reads a 64-bit operand (for example, MOVQ MM0, m64).
The register-memory form of any MMX technology instruction that operates on a quadword memory operand (for example, PMADDWD MM0, m64).
Write 64-bit quadwords using the MMX technology instruction that writes a 64-bit operand (for example, MOVQ MM0, m64).
This example prevents stalls by putting the reads in a separate loop, far away from the writes.
Original |
Optimized |
---|---|
|
|
In each pass through the loop, this code writes two bytes to the address of array[i], writes two bytes to the offset of the address of array[i], i.e., array[i+1], and reads four bytes from the address of array[i]. Each read causes a stall. |
This code prevents the stalls by writing all
the data into the two arrays in the first loop. |
This example prevents stalls by putting reads far away from the writes that come before them.
Original |
Optimized |
---|---|
|
|
In each pass through the loop, this code writes
two bytes to the address of array[i], and reads two bytes from an offset
from the address of array[i], i.e., array[i] + 1. |
This code prevents the stalls by first writing
all the data in one loop to the address of array[i]. |