Store(s) could not be forwarded to load(s) due to alignment problems, which can cause a stall the length of the pipeline. Note: Using the latest compilers from Microsoft or Intel will usually decrease or eliminate these types of events.
For systems with the Hyper-Threading Technology enabled or multi-processor systems, the estimated impact represents the total processor time impact (added across all logical/physical processors on the system), and not the "wall-clock" time impact. Therefore, on a system with the Hyper-Threading Technology enabled or multi-processor system, it is quite possible to see an insight having an impact greater than the workload wall-clock run time. Note that on a UP system, processor time is the same as wall-clock time.
One of the ways that the Intel(R) Pentium(R) 4 processor is able to achieve high performance is to optimistically assume that a condition holds which could lead to greater performance. One way it does this is to predict a branch's outcome and speculatively execute down one path before the branch is resolved. Another is to execute some memory operations out of order. There are some memory-related performance monitoring events that count speculative actions as well as non-speculative actions, and therefore produce larger counts than they would without including speculative actions. Executing memory operations out of order leads to more speculative actions (that are counted as events), which may lead to a larger count than what would be obtained if this speculation had not occurred. For example, consider the case where the processor tries to speculatively execute a load out of order. Suppose the processor has the wrong load address because the code is doing pointer chasing and the processor does not yet have the right address. If the load causes a store-forwarding violation, the Mob Load Replays Retired event increments. Because of the speculation, there was a Mob Load Replays Retired event that would not occur without the speculation. This speculation appears as an over-counting condition when viewed from an architectural perspective. The instruction reportedly experienced a violation of store-forwarding because of speculative activity, even though it would not have experienced a store-forwarding violation without the speculation. In fact, it is an accurate representation of what is happening at the microarchitectural level: this instruction did cause a store-forwarding violation.
Counter dependencies:
This insight is dependent on the following performance counter function:
Store Forward Performance Impact = ((MOB Loads Replays
Retired*50)/Clockticks)*100
low value: 0.2
high value: 2
This insight is relevant when Store Forward Performance Impact is high.