Instructions that Require Slow Decoding

The slow length decoder is activated when one of the following scenarios occurs:

The following sections explain these scenarios and provide examples of alternative assembly code that does not require slow decoder activation.

Instructions with Length Changing Prefix (LCP)

Instructions with LCP change their length according to the two prefixes:

For example, the following instruction encoded as (35 FF FF 00 00) looks like

xor eax,0xffff

While the same instruction encoded as (66 35 FF FF) has a different size (the instruction size calculation does not include the 66 prefix):

xor ax,0xffff

The instruction length decoder of the Intel(R) Core(TM) Solo and Intel(R) Core(TM) Duo processors can not decode the length of an LCP instruction in one cycle, therefore it initiates slow decoding, which takes five extra cycles to complete.

Advice:

Avoid using instructions with immediate values that require a length-changing prefix. The most common scenario for those is 16-bit immediate in 32-bit code.

You can use the VTune(TM) Performance Analyzer to count the number of slow decoder activations by using the LCP stall event.

LCP Instruction Sample

The following C code stores a constant 0x5000 to a short variable:

short a;

int foo()

{

a = 0x5000;

}

The following table provides an example of assembly with an LCP stall and alternative code without the stall.

Assembly Alternative 1: with LCP Instruction, 2 Instructions

Assembly Alternative 2: No LCP, 3 Instructions

mov word ptr a,0x5000

ret

mov eax,0x5000

mov word ptr a,ax

ret

 

Performance relative to Alternative 1:  400%

Instructions with a False LCP

The slow length decoder is activated when processing instructions with a false LCP. This happens in the following cases:

Advice:

The following C code negates a 16-bit value.

short a;

void foo()

{

a = -a;

}

The following table provides an example of assembly with a false LCP stall and alternative code without the stall:

Assembly Alternative 1: False LCP, 2 Instructions

Assembly alternative 2: No LCP, 4 Instructions

neg word ptr a

ret

movsx eax,word ptr a

neg eax

mov word ptr a,ax

ret

 

Performance relative to Alternative 1:  181%

Advice:

Avoid using 16-bit variants of 0xF7 instructions.

Instructions Which Cause Double Slow Decoding Activation

The slow-length decoder is activated twice when:

Double activation of  the slow decoder creates an 11-cycle decode bubble instead of the five cycles caused by a single slow decoding operation.