Overview topicAbout the Overall Tuning Methodology

The general tuning methodology begins at the system level and goes down to the microarchitecture level. Regardless of your specific tuning goals, you should conduct the analysis level by level, in the following order, that is, from a high to a low level:

  1. System-level

  2. Application-level

  3. Microarchitecture-level

First ensure you don't have any system-level bottlenecks. Once you ensure that the processor is highly utilized, focus on application bottlenecks, followed by microarchitecture bottlenecks.

The general rule is that you will achieve greater speedups at a higher level compared to the same time-investment at a lower level.

Follow these steps in order to achieve the best speedup in the shortest amount of time.

For example, if you start with microarchitecture tuning, but the processor was only utilized 10% of the time due to system-level bottlenecks, a 50% speedup at the microarchitecture level would only achieve a 5% workload-level speedup, since the processor is only being used 10% of the time during the workload.

Before beginning tuning and during tuning after making major changes, check your application’s processor utilization to determine whether your application is currently processor-intensive, I/O-intensive, or somewhere in between.

If processor utilization is:

There are three main strategies for improving application performance.  Each strategy has an effect on processor utilization.

After some amount of system-level or application-level tuning, processor utilization may increase and you may find your application is ready for microarchitecture-level tuning. Conversely, after some amount of application-level or microarchitecture-level tuning, processor utilization may decrease and you may find you are ready for more system-level tuning.  

Experience acquired by performance analysts at Intel indicate that speedups in the 3x range are very common when performing system-level tuning, 2x when performing application-level tuning, and 1.1 to 1.5x speedups when performing microarchitecture level tuning.

You can benefit from several system-level optimizations that almost always positively impact the overall application performance.

For example:

Tuning Goals and Areas to Investigate

Order

Tuning Level

Goals

Key Areas to Investigate

Estimated  Speedup

1

High: System-level

Speed up the application by improving how the application interacts with the system

  • Network Problems

  • Disk Performance

  • Memory Usage

 3X

2

Medium: Application-level

Speed up the application by improving the application's algorithms

  • Locks

  • Heap contention

  • Threading Algorithm

  • APIs Usage

 

2X

3

Low:

Microarchitecture-level

Speed up the application by improving how the application runs on the specific processors

  • Architecture Coding Pitfalls

  • Data/Code Locality (Cache)

  • Data Alignment

1.1-1.5X

Use the VTune(TM) Performance Analyzer to implement these tuning methodologies to achieve the most performance gain with the least effort.