IntelĀ® Tuning Assistant topicSecond-Level Cache Load Misses

The 2nd level Cache load misses event has several errata, explained below, that make it difficult to get an accurate account of the true impact from 2nd level cache load misses using this particular event. A better event to use is "Non-prefetch Read Requests Underway from The Processor". This event is already part of the "Primary Performance Tuning Events" group. The insight in which the Intel(R) Tuning Assistant will report the impact from this event is called "On-Chip cache load misses". If you have already performed sampling using the Primary performance tuning events group and the "On-Chip cache load misses" insight does not appear but "Second-level cache load misses" does, then you most likely have one or several of the errata below causing a false over-counting condition.

Second-level cache load misses may be causing a significant negative performance impact. The working data set is too large to fit into the second-level cache. When Hyper-Threading Technology is on, it is shared among the two threads and therefore, the working data set should be targeted to fit into 1/4 to 1/2 of the second-level cache. For systems WITHOUT Hyper-Threading Technology enabled, the working data set should be targeted to fit into 1/2 of the second-level cache. It is important to remember that the penalty for a second-level cache misses is much larger than for a first-level cache miss.

For systems with the Hyper-Threading Technology enabled or multi-processor systems, the estimated impact represents the total processor time impact (added across all logical/physical processors on the system), and not the "wall-clock" time impact. Therefore, on a system with the Hyper-Threading Technology enabled or multi-processor system, it is quite possible to see an insight having an impact greater than the workload wall-clock run time. Note that on a UP system, processor time is the same as wall-clock time.

Note

This indicator is calculated under the assumption that there is no third-level cache. This estimated impact can essentially be ignored if the data being analyzed was captured on an Intel(R) processor large cache product with a third level cache.

One of the ways that the Intel(R) Pentium(R) 4 processor is able to achieve high performance is to optimistically assume that a condition holds which could lead to greater performance. One way it does this is to predict a branch's outcome and speculatively execute down one path before the branch is resolved. Another is to execute some memory operations out of order. There are some memory-related performance monitoring events that count speculative actions as well as non-speculative actions, and therefore produce larger counts than they would without including speculative actions. Executing memory operations out of order leads to more speculative actions (that are counted as events), which may lead to a larger count than what would be obtained if this speculation had not occurred. For example, the processor could be trying to speculatively execute a load out of order. Suppose the load would miss the cache if executed early (out of order), but would hit the cache if executed later, in order. This might occur, for example, because an earlier access to the same cache line was still underway at the earlier time, but filled the data into the cache by that time. Because of the speculation, there was a cache miss that wouldn't occur without the speculation. This appears as an over-counting condition, when viewed from an architectural perspective: the instruction reportedly experienced a cache miss because of speculative activity, even though it would not have experienced a cache miss without the speculation. In fact, it is an accurate representation of what is happening at the microarchitectural-level: this instruction did cause a cache miss. In some rare cases, the over-counting can be significant. These cases usually involve pointer-chasing code, such as a memory reference that is a reference to another memory location and so on.

Counter dependencies:

This insight is dependent on the following performance counter functions:

Level2CacheLoadMiss Performance Impact = ((2nd Level Cache Load Misses Retired*(15*(ProcessorSpeed/BusSpeed)))/Clockticks)*100
low value:
0.2
high value:
2

Level2CacheLoad Hit Rate = (100-((2nd Level Cache Load Misses Retired/Loads Retired)*100))
low value:
89
high value:
99

This insight is relevant when Level2CacheLoadMiss Performance Impact is high or Level2CacheLoad Hit Rate is low.

Advice:

Avoiding Second-Level Cache Load Misses