Before starting microarchitecture-level tuning, make sure your processor is being highly utilized over the application workload. See System-level Tuning and Application-level Tuning topics for information on ensuring high processor utilization.
The typical tuning methodology recommended for multiple Pentium(R) 4 processors with Hyper-Threading Technology is as follows:
First, follow the steps in Tuning for Pentium(R) 4 Processor Systems (Single-Processor) to find general Intel(R) Pentium(R) 4 processor optimizations.
Next, follow the steps in Tuning for Multiple Pentium(R) 4 Processors (DP/MP) to find dual-processor/multi-processor optimizations.
Finally, follow the steps below to find optimizations related to Hyper-Threading Technology.
If you do not have access to the above systems in the order listed above, you could tune in a random order and still get some benefit, but it will be a more difficult and time consuming process that probably not yield the as much speedup results.
Why should you tune on a Pentium 4 processor and then tune for multiple Pentium 4 processors before tuning for a system with Hyper-Threading Technology? You need to have good performance on a Pentium 4 processor without Hyper-Threading Technology and good scaling on a dual-processor system if you expect to get the greatest benefits from a processor with the Hyper-Threading Technology. For example, even if the processor is being fully utilized and you get good scaling on a dual-processor system, there may be several coding pitfalls that are unnecessarily utilizing the Pentium 4 processor without Hyper-Threading Technology (large impact from stores that aren't forwarding, denormal number assists, and so on). This is why it is important to tune for the Pentium 4 processor first. Once the application is tuned for the Pentium 4 processor microarchitecture, then you need to find out how well the application scales on two (spell out numbers less than 10) physical processors. This is also critical . For example, if you are only getting a 20% speedup when using 2 processors you can't expect to get much benefit from a processor with Hyper-Threading Technology. The general rule of thumb is that you need to be seeing > 1.5x speedup on a 2 processor system before proceeding to tuning for Hyper-Threading Technology.
Choose one of the options below, then collect sampling data using the specified systems, binaries, and event groups. See the section Collecting sampling data prior to tuning to learn how to collect the data. If you collect Activity results on multiple systems, use Pack and Go to move these all onto the same system.
Option A: Tuning for the Pentium 4 Processor with Hyper-Threading Technology
Activity Result Type |
System Type |
Application Binary Type |
Event Group |
Benefit |
---|---|---|---|---|
Primary Activity result |
Single Intel(R) Pentium(R) 4 processor with Hyper-Threading Technology |
Current |
Performance Tuning Events for Hyper-Threading Technology |
(required step) |
Reference Activity Result |
Single Intel(R) Pentium(R) 4 Processor |
Current |
Performance Tuning Events - Primary |
Reveals opportunities for speeding up the application, related to Hyper-Threading Technology. |
Reference Activity Result 2 |
Dual Intel(R) Pentium(R) 4 Processor |
Current |
Performance Tuning Events - Primary |
Reveals the upper-bound on performance for the system with Hyper-Threading Technology. |
Option B: Comparing multiple binaries (when Tuning for the Pentium 4 processor with Hyper-Threading Technology)
Activity Result Type |
System Type |
Application Binary Type |
Event Group |
Benefit |
---|---|---|---|---|
Primary Activity result |
Single Intel(R) Pentium(R) 4 processor with Hyper-Threading Technology |
Current |
Performance Tuning Events for Hyper-Threading Technology |
(required step) |
Reference Activity Result (optional) |
Single Intel(R) Pentium(R) 4 processor with Hyper-Threading Technology |
Previous (before the last round of changes) |
Performance Tuning Events for Hyper-Threading Technology |
Reveals detailed information about the location and nature of the performance changes caused by the last round of code changes. |
Reference Activity Result 2 (optional) |
Single Intel(R) Pentium(R) 4 processor with Hyper-Threading Technology |
A third version of the binaries |
Performance Tuning Events for Hyper-Threading Technology |
Same as above, for a third version of the binaries. |
After collecting sampling data and double-clicking on the Primary Activity result in the Tuning Browser, start at a high level by selecting all processes and/or modules that you are interested in tuning.
I
The Intel(R) Optimization Manual for your processor can be a valuable source of information on microarchitecture-level tuning.